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ON  VAGOENtSS  AND  FICTIONS  AS  CORNERSTONES  OF  A  THEORY  OF  PERCEIVING  AND  ACT¬ 
ING;  A  COMMENT  ON  WALTER  (1983)* 


Claudia  Carellot  and  M.  T.  Turveytt 


"I  don't  want  realism.  I  want  magic!" 

Blanche  DuBois,  Scene  9,  A  Streetcar  Named  Desire 

Vagueness  or  unclarity  of  thought  is  considered  by  Walter  (1983)  as  a 
worthy  and  necessary  state  of  (human)  mind  for  modeling.  He  appeals  to  quan¬ 
tum  mechanics  (and,  in  particular,  non-pure  states)  as,  perhaps,  the  only 
fruitful  model  by  which  to  understand  such  phenomena.  The  analogy  takes  the 
following  form:  The  clarity  that  Indeterminant  ideas  derive  from  rumination 
and  discussion  parallels  the  reduction  of  uncertainty  in  a  parameter  of  a 
submicroscoplc  system  that  accompanies  its  quantum  measurement.  Walter  sug¬ 
gests  that  with  an  allowance  for  quantum-like  brain  states,  brains  can  be 
classified  as  physical  symbol  systems — processors  that  read,  write,  store,  and 
compare  symbols — of  the  type  described  by  Newell  and  Simon  (Newell,  1981;  New¬ 
ell  &  Simon,  1976;  Simon,  1981). 
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As  a  revealing  aside  (developed  more  fully  in  Walter,  1980),  Walter 
(1983)  asserts  that  both  scientists'  theorizing  about  perceiving  and  animals* 
perceiving  are  largely  story-telling.  His  implication  seems  to  be  that  we  in¬ 
vent  fictions  that  may  or  may  not  pertain  to  what  is  really  going  on  but,  at 
least,  help  us  muddle  through  our  laboratories  and  our  environments. 
Scientists  fashion  explanations  (in  a  manner  of  speaking)  in  an  attempt  to 
sort  out  reaction  times,  thresholds,  and  so  on,  while  perceivers  contrive 
hypotheses  to  sort  out  patches  of  color,  horizontal  lines,  and  so  on.  The 
story's  relation  to  reality  is  inconsequential  as  long  as  it  is  useful,  where 
useful  seems  to  be  read  as  leading  to  the  next  (preferably  consistent)  fic¬ 
tion.  If  a  fiction  loses  its  usefulness  to  scientist  or  perceiver,  it  can  be 
replaced  with  a  new  one — no  more  real  but,  ideally,  more  useful. 

As  he  rightly  points  out,  Walter's  position  is  in  conflict  with  ecologi¬ 
cal  realism.  Beyond  that  assessment,  however,  whatever  it  is  that  Walter 
describes  as  ecological  realism  bears  little  resemblance  to  the  framework 
carved  out  by  Gibson  over  some  30  years  (e.g.,  Gibson,  1966,  1979,  1982)  and 
elaborated  by  others  (e.g.,  Michaels  &  Carello,  1981;  Reed  &  Jones,  1982;  Shaw 
i  Turvey,  1982;  Shaw,  Turvey,  &  Mace,  1983;  Turvey  &  Carello,  1981;  Turvey, 
Shaw,  Reed,  &  Mace,  1981).  In  what  follows,  we  shall  point  out  where  Walter 


*Cognition  and  Brain  Theory,  198*1,  7,  2*17-261  . 
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missteps  in  his  treatment  of  realism,  clarify  our  conflict  with  his  strategy, 
and  elaborate  our  own  strategy  for  modeling  behavior  at  the  ecological  scale 
of  animal-environment  systems  (see  below).  In  so  doing,  we  shall  attempt  to 
show  that  Walter's  posture  on  realism,  while  understandable  in  the  beleaguered 
heroine  of  Tennessee  Williams'  play,  is  less  sympathetic  in  a  (reasonably  con¬ 
tent)  scientist. 

Alternative  or  Contradictory  Descriptions  Do  Not  Deny  Realism 

While  Walter's  discontent  with  ecological  realism  includes  our  neglect  of 
quantum-like  brain  phenomena,  he  sees  the  existence  of  fictions — be  they 
scientists'  oft-changing  models  of  the  world  or  animals'  deceptive  behavior  In 
times  of  danger  or  play — as  a  more  fundamental  difficulty  because  they  belie 
the  claim  that  reality  can  be  apprehended. 

The  pervasiveness  of  fictions,  deception,  play,  and  so  on,  make  the 
whole  ideology  of  "realism"  seem  rather  unlikely  to  me,  as  a 
productive  model  for  mammalian  nervous  systems.  A  notion  of  useful 
fictions  ("useful"  perhaps  to  be  defined  in  neo-Darwinian  terms) 
seems  more  likely  than  either  ecological,  or  naive,  realism,  to 
yield  an  adequate  description  of  this  most  complicated  organ  system. 

(p.  233) 

Not  surprisingly ,  we  do  not  agree  with  this  evaluation  of  the  ramifications  of 
such  phenomena.  First,  dubbing  them  "fictions"  is  inaccurate  and  misleading. 
And,  second,  it  is  unlikely  that  fictions,  with  the  suggestion  that  the  at¬ 
tainment  of  goals  is  accidental,  could  ever  be  reliably  useful.  Let  us 
elaborate  this  argument. 

The  notion  that  science  engages  in  the  fabrication  of  useful  fictions  has 
a  parallel  in  legal  practice  (Walter,  1980).  Just  as  it  is  convenient  but  in¬ 
correct  to  conceive  of  a  corporation  as  a  single  person  in  certain  legal  cir¬ 
cumstances  so,  too,  is  it  useful  but  fictitious  to  conceive  of  space  as 
Euclidean  in  some  circumstances  and  curved  in  others.  Walter  claims  that  sci¬ 
ence  would  be  better  served  by  acknowledging  that  its  models,  however  useful, 
are  fictions  "because  the  inconsistencies  between  scientific  views  of  'reali¬ 
ty'  in  different  contexts  will  be  more  damaging"  (Walter,  1980,  p.  R366). 

But  do  the  seeming  contradictions  entailed  by  different  characterizations 
of  space,  for  example,  remove  all  characterizations  from  the  realm  of  reality 
(unqualified  by  quotation  marks)?  In  other  words,  if  a  given  notion  changes 
relative  to  changes  in  the  problem  of  interest,  does  this  relativity  preclude 
a  consideration  of  that  notion  as  objective  and  real?  We  have  argued  else¬ 
where  that  it  does  not  and.  Indeed,  that  the  concept  of  an  absolute  reality 
that  would  be  appropriate  for  all  grains  of  analysis  is  untenable  (Gibson, 
1979;  Michaels  &  Carello,  1981;  Shaw,  Turvey,  &  Mace,  1982;  of.  Prlgogine  & 
Stengers,  1984,  chap.  7). 

Appropriateness  is  the  key  idea  here — the  level  of  description  of  reality 
must  be  commensurate  with  the  level  of  inquiry,  that  is,  with  the  type  of  sys¬ 
temic  Interactions  that  are  of  interest  (cf.  Rosen,  1978).  Although  Walter 
(1980)  says,  "When  making  human-scale  measurements,  for  example,  precision 
seldom  requires  us  to  incorporate  either  relativistic  space  curvature  or 
super-spacellke  microtopological  fluctuations"  (p.  R367),  it  is  not  disem¬ 
bodied  "precision"  that  renders  such  analyses  unnecessary.  Rather,  those 
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analyses  are  inappropriate  because  human  activities  ^  not  occur  at  those  lev¬ 
els.  Human  (and  animal)  behavior  occurs  with  reference  to  the  animal-specif¬ 
ic,  activity-relevant  properties  of  the  environment — what  Gibson  has  termed 
affordances  (1979).  Affordances,  it  is  proposed,  are  the  appropriate  level  of 
description  of  reality  for  the  ecological  scale.  The  lengthy,  difficult 
search  initiated  by  Grinnel  (1917)  and  Elton  (1927)  to  find  a  systematic  and 
evolutionarily  consistent  way  to  define  the  econiche — the  related  environmen¬ 
tal  realities  supporting  a  given  species'  lifestyle — has  begun  to  focus  on  the 
view  of  the  econiche  as  an  affordance  structure  (Alley,  in  press;  Patten, 
1982). 

Affordances  are  both  relative — they  are  defined  with  reference  to  a 
particular  animal — and  objective — they  are  defined  by  persisting  properties  of 
the  environment.  As  an  example,  consider  a  brink  in  a  surface.  For  an  animal 
of  a  given  size,  that  brink  affords  stepping  down;  for  an  animal  of  a  given 
smaller  size,  that  brink  affords  falling  off.  The  reality  of  that  particular 
layout  of  surfaces  as  a  step-down  place  or  a  falling-off  place  is  relative  to 
the  animal.  Yet  the  nature  of  those  relative  realities  is  determined  by  the 
independent  character  of  the  surface  layout — for  example,  that  it  is  comprised 
of  vertically  separated  substantial  surfaces  rather  than  liquid  ones.  This 
echoes  a  point  made  by  Lewis  (1929): 

Relativity  is  not  incompatible  with,  but  requires,  an  independent 
character  in  what  is  thus  relative.  And  second,  though  what  is  thus 
relative  cannot  be  known  apart  from  such  relation  ...  all  such  rel¬ 
ative  knowledge  is  true  knowledge  of  that  independent  character 
which,  together  with  the  other  term  or  terms  of  this  relationship, 
determines  this  content  of  our  relative  knowledge,  (pp.  172-173) 

The  coexistence  of  contradictory  descriptions  of  reality  (e.g., 
step-downable  vs.  not  step-downable,  curved  vs.  Euclidean  space)  does  not  mean 
that  these  descriptions  are  fictions  (cf.  Ben-Zeev,  in  press).  It  simply 
means  that  different  problems  appeal  to  different  aspects  of  reality.  No  one 
description  is  universally  privileged  (cf.  Alley,  in  press;  Rosen,  1978). 
Indeed,  contrary  to  Walter's  efforts  to  marshal  quantum  phenomena  in 
opposition  to  realism,  the  same  point  has  been  made  for  that  domain  by 
Prigogine  and  Stengers  (198i1): 

The  irreducible  plurality  of  perspectives  on  the  same  reality 
expresses  the  impossibility  of  a  divine  point  of  view  from  which  the 
whole  of  reality  is  visible  (p.  224).  The  real  lesson  to  be  learned 
from  the  principle  of  complementarity  rTtallcs  added]  a  lesson  that 
can  perhaps  be  transferred  to  other  fields  of  knowledge,  consists  in 
emphasizing  the  wealth  of  reality,  which  overflows  any  single  lan¬ 
guage,  any  single  logical  structure,  (p.  225) 

Biased  by  his  concern  about  what  scientists  do  when  they  theorize  about 
the  world,  Walter  is  confused  in  his  attitude  toward  what  animals  (including 
humans)  do  when  they  perceive  their  environments.  He  claims  that  the  fictions 
by  which  scientists  think  they  understand  the  universe  have  parallels  in  those 
cases  where  perceivers  are  cuped  by  deceptions.  We  have  already  argued  that 
scientific  models  of  natural  phenomena  need  not  be  considered  fictions,  even 
if  models  of  the  same  phenomenon  at  different  levels  are  inconsistent.  But 
surely  there  are  scientific  models  that  are  Just  plain  wrong — phlogiston, 
aether,  and  spontaneous  generation,  to  name  a  few.  Do  these  speak  to  the 
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possibility  of  perceivers  knowing  reality?  They  do  not  because  they  involve 
issues  of  scientific  realism,  not  perceptual  realism  (see  Blackmore,  1979). 
That  is  to  say,  the  question  of  whether  or  not  scientists  can  be  successful  in 
understanding  nature  is  independent  of  whether  or  not  perceivers  are  success¬ 
ful  in  knowing  the  environment  as  it  constrains  their  day-to-day  activities. 
Scientists  can  flounder  for  any  number  of  reasons — religious  dogma,  bad 
experiments,  3tupldity--but  for  animals  to  "move  so  they  can  eat,  and  eat  so 
they  can  move"  (Iberall,  1974)  and  thereby  survive,  they  must  be  in  contact 
with  the  facts  of  their  environments.  Animals  cannot  act  effectively  with  re¬ 
spect  to  fictions. 

What  of  Walter's  contention  that  the  fictions  are  useful?  Doesn't  that 
empower  them  to  guide  activity?  It  is  not  at  all  clear  how  a  fiction,  unfet¬ 
tered  as  it  is  by  actual  states  of  affairs,  could  ever  be  useful.  What  guides 
the  construction  of  a  fiction  so  that  it  is  at  least  relevant  to  an  intended 
action — for  example,  a  given  layout  of  surfaces  is  fictionalized  as  being  in 
the  realm  of  stepping  (on)  or  falling  (off)  rather  than  swimming  (in),  squeez¬ 
ing,  eating,  ad  infinitum?  And  by  what  criterion  might  a  given  fiction  be 
deemed  useful?  There  must  be  some  standard  of  comparison.  If  the  actual 
state  of  affairs  provides  the  comparison,  realism  cannot  be  avoided. 

Deception  Presupposes  Realism 

Walter's  example  of  deceptive  animal  behavior  might  seem  tailor-made  for 
a  fictioh  framework.  A  mother  bird  saves  her  offspring  by  feigning  injury  so 
that  a  fox  will  follow  and  attack  her  in  the  mistaken  belief  that  her  broken 
wing  will  prevent  her  escape.  She  has  created  a  fiction — the  predator  per¬ 
ceives  an  injury  that  does  not  exist — that  is  useful  in  preserving  her 
species.  Such  circumstances  are  quite  rare  in  nature,  however;  not  all 
animals  engage  in  deception,  and,  for  those  that  do,  deception  constitutes  a 
small  part  of  their  behavioral  repertoires.  Deception  provides  a  disputable 
foundation,  therefore,  upon  which  to  build  an  account  of  perceiving.  Nonethe¬ 
less,  we  would  emphasize  the  lawful  basis  that  allows  the  mother  to  enact  a 
successful  charade  and  the  fox  to  act  upon  it.  She  must  constrain  her 
musculature  in  Just  that  way  that  will  produce  postural  and  Joint  adjustments 
specific  to  a  particular  dynamic  condition  (viz.,  material  structure  too  weak 
to  support  the  characteristic  wing  movement).  For  his  part,  the  fox  must  de¬ 
tect  the  dynamics  that  underlie  the  bird's  kinematic  display.  In  order  to 
pursue  a  realist  basis  for  deceptive  behavior,  we  will  elaborate  this 
so-called  kinematic  specification  of  dynamics  (or  KSD)  principle  (Runeson, 
1977/1983;  Runeson  &  Frykholm,  1983). 

The  principle  starts  with  the  reasonable  assumption  that,  because  the 
body  is  composed  of  certain  masses  and  lengths  and  types  of  Joints,  only  cer¬ 
tain  movements  will  be  biomechanically  possible.  The  biomechanics  will  also 
determine  what  one  must  do  to  maintain  balance  and  cope  with  reactive  forces 
(those  "back-generated"  by  the  act  of  moving).  The  kinematic  properties  of  an 
action  (its  variously  directed  motions,  its  accelerations  and  decelerations) 
are  determined  by  the  dynamic  conditions  that  underlie  it — the  forces  produced 
intentionally  and  unintentionally  by  the  animal  and  those  supplied  by  the 
surrounding  surfaces  of  support.  The  KSD  principle  suggests  that  a  reciprocal 
relationship  also  exists:  The  kinematic  properties  of  acts  are  transparent  to 
the  dynamic  properties  that  caused  them.  For  an  observer,  this  principle 
reads:  The  ambient  optic  array  (see  Gibson,  1979;  Lee,  1974,  1976)  is  struc¬ 
tured  by  an  animal's  movements  such  that  macroscopic  qualitative  properties  of 
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the  optic  array  are  specific  to  and,  therefore,  information  about,  the  forces 
that  produced  the  movements. 

The  principle  finds  support  in  experimental  investigations  of  human  move¬ 
ment  perception  that  use  Johansson’s  (1973)  patch-light  technique.  This 
methodology  entails  limiting  an  observer's  view  of  actors  (i.e.,  people  who 
engage  in  activities)  to  small  lights  that  are  attached  to  their  major  joints. 
When  a  person  engages  in  some  activity,  a  transforming  pattern  of  lights  is 
generated.  Perceivers  find  this  limited  optical  structure  to  be  Informative 
about  a  number  of  properties,  including  metrical  (length  of  throw  of  an 
invisible  thrown  object  of  unknown  mass  [Runeson  &  Frykholm,  1983]), 
biomechanic  (gender  of  a  walker  [Cutting,  Proffitt,  &  Kozlowski,  1978; 
Kozlowski  &  Cutting,  1977;  Runeson  &  Frykholm,  1983]),  and  kinetic  (the  weight 
of  a  lifted  box  [Runeson  &  Frykholm,  1981]).  Importantly,  Runeson  and  Fryk¬ 
holm  (1983)  have  shown  that  perceivers  are  not  easily  fooled  by  actors' 
efforts  to  be  deceptive.  Despite  attempts  to  fake  the  weight  of  a  lifted  box, 
observers  not  only  perceive  the  real  weight  but  are  aware  of  the  deceptive 
intention  and  the  intended  deception  (i.e.,  what  weight  is  being  faked)  as 
well.  Similar  results  are  found  in  attempts  to  be  deceptive  about  one's 
gender  (through  gait  and  carriage  in  a  variety  of  actions) — observers  are 
aware  of  both  real  gender  and  faked  gender.  The  point  to  be  underscored  is 
that  an  actor  can  structure  light  in  ways  that  provide  information  about 
conditions  that  do  not  exist  (see  Gibson,  1966;  Michaels  &  Carello,  1981;  Tur¬ 
vey  et  al,,  1981,  for  realist  accounts  of  this  fact)  while  simultaneously  (and 
unavoidably)  providing  information  about  conditions  that  do  exist,  and 
perceivers  can  be  aware  of  both. 

Runeson  and  Frykholm  draw  a  parallel  with  the  dual  reality  of  pictures, 
especially  as  it  has  been  described  by  Gibson:  There  is  information  about 
objects  represented  in  the  picture  and  information  about  the  picture  itself  as 
an  object.  "The  duality  of  information  in  the  array  is  what  causes  the  dual 
experience"  (Gibson,  1979,  p.  283).  The  possibility  of  dual  awareness  may 
speak  to  the  dearth  of  true  deceptions  in  nature.  For  very  sound  physical 
reasons,  situations  that  lend  themselves  to  single  awareness  deception  are, 
contrary  to  what  Walter  seems  to  imply,  difficult  to  manufacture  and,  in 
consequence,  quite  rare,.  Intraspecific  threat  and  play  behavior,  on  the  other 
hand,  are  found  throughout  the  animal  kingdom.  But  it  seems  to  be  a  misnomer 
to  label  these  "deceptions"  in  the  sense  of  trickery.  Baboons  who  bare  their 
teeth  have  not  fabricated  a  fearsome  weapon.  They  are  suggesting  that  they 
would  rather  not  use  the  ones  they  have.  Chimpanzees  who  play  attack-and-flee 
are  not  deluded;  they  behave  differently  in  true  fight-escape  circumstances 
(Loisos,  1969).  Play  provides  an  opportunity  to  learn  about  one's  environ¬ 
ment,  conspecif ics,  and  one's  own  behavioral  possibilities. 

We  have  argued  that  characterizing  perception  as  useful  fictions  is  inad¬ 
equate  to  explain  behavior  in  natural  circumstances.  An  explanation  of  ef¬ 
fective  behavior  requires  a  realist  framework  with  the  animal-environment  sys¬ 
tem  as  the  unit  of  analysis.  Walter,  however,  is  skeptical  of  whether  such  an 
analysis  is  possible.  We  contend  that  his  objection  is  based  on  an 
overevaluation  of  what  can  be  distilled  from  brain  state  accounts  and  a 
misunderstanding  of  what  "animal-environment  system"  means.  We  will  deal  with 
each  of  these  Issues  in  the  next  two  sections. 
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Brain  States  Are  An  Inadequate  Basis  For  Ascribing  Intentional  Content 

Walter  implies  that  any  perspective  that  does  not  advert  to  observations 
of  brain  states  cannot  provide  a  dynamically  useful  formulation  of  behavior. 
However,  he  prudently  avoids  any  discussion  of  how  observations  of  brain 
states  would  yield  the  proposed  useful  formulation.  Presumably,  Walter’s 
advocated  observations  or  measurements  of  the  brain — no  matter  how  precise  or 
vague  those  measurements  may  be — would  provide  only  extensional  descriptions. 
And,  presumably,  a  physical  or  biological  theory  of  the  brain  strictly  con¬ 
sistent  with  such  observations  could  only  be  extensional.  At  best,  observa¬ 
tions  of  brain  states,  purely  Interpreted,  would  lead  to  an  account  roughly  of 
the  form:  In  the  context  of  functional  brain  organizations  P  and  Q,  function¬ 
al  brain  organization  R  has  the  capacity  of  inducing  functional  brain  organi¬ 
zation  S.  This  would  not  be  a  dynamically  useful  formulation  of  behavior.  No 
matter  how  elaborate  and  detailed  such  an  extensional  account  becomes,  it  will 
never  allow  Walter  to  answer  apparently  straightforward  questions  about  prosa¬ 
ic  behaviors.  For  example,  how  does  an  outfielder  know  to  charge  in  rather 
than  retreat  to  catch  a  ball  (Todd,  1981  )?  Why  does  a  child,  on  seeing  a 
particular  surface,  initiate  crawling  rather  than  walking  to  traverse  the  sur¬ 
face  (E.  Gibson,  1983)?  The  important  ingredient  missing  from  the  foregoing 
brain-state  based  account  of  behavior  is  intentionality . 

A  dynamically  useful  formulation  of  behavior  grounded  in  observations  of 
brain  states  requires  minimally  (1)  a  principled  basis  for  individuating  brain 
states,  and  (2)  a  principled  basis  for  ascribing  content  to  individuated  brain 
states.  The  latter  refers  to  the  problem  of  systematically  upgrading  the 
extensional  characterizations  of  brain  states  to  intentional  characteriza¬ 
tions,  ordinarily  expressed  by  intensional  statements  (Dennet,  1969;  Fodor, 
1981;  but  see  Searle,  1983).  The  point  is  that  without  identifying  the 
contents  (the  significances,  the  meanings,  the  message  functions,  the  signal¬ 
ling  functions,  etc.)  of  brain  states,  the  brain  theorist’s  view  of  brain 
function  in  relation  to  bencvior  is  empty.  The  Intentional  characterization 
earns  for  the  brain  theorist  the  luxury  of  addressing  the  question  of  what  the 
brain  states  are  about.  From  what  observations  and  on  what  grounds  would  an 
advocate  of  the  explanatory  power  of  brain  states  fashion  intentional  charac¬ 
terizations?  Those  characterizations  arise  at  and  are  the  sine  qua  non  of  the 
ecological  scale  of  animal-environment  systems. 

Intentional  characterizations  should  not  be  interpreted  as  referring  to 
systemic  states  that  are  in  addition  to  or  separate  from  those  extensionally 
characterized.  Intentional  characterizations  usually  comprise  alternative 
(discrete,  symbolic)  descriptions  of  a  system's  states,  descriptions  that  com¬ 
plement  the  extensional  (continuous,  dynamical)  accounts  of  how  a  system  is 
doing  what  it  is  doing.  Pattee  (e.g. ,  1973,  1977)  has  been  foremost  in 
identifying  the  problem  of  understanding  how  these  two  complementary  modes  of 
description  of  any  complex  system  can  be  treated  in  a  physically  consistent 
way.  The  ecological  approach  to  perception  and  action  has  been  concerned  sim¬ 
ilarly  with  the  complementarity  of  intentional  and  extensional  characteriza¬ 
tions  (e.g.,  Carello,  Turvey,  Kugler,  Jt  Shaw,  198^4),  but  it  has  been  concerned 
more  directly  with  elaborating  the  extensional  basis  for  ascribing 
Intentionality  to  states  of  the  animal-environment  system  in  a  principled 
manner  (e.g.,  Gibson,  1979;  Kugler,  Kelso,  &  Turvey,  1980,  1982;  Turvey  et 
al.,  1981).  This  strategy  has  been  chosen  because  the  principled  ascription 
of  content  to  the  states  of  a  system  rests  ultimately  on  the  accuracy  and 
specific  predictions  of  the  extensional  account  of  the  system.  As  Dennett 
(1969)  puts  it: 
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The  ascription  of  content  is  thus  always  an  ex  post  facto  step,  and 
the  traffic  between  the  extensional  and  the  intentional  levels  of 
explanation  is  all  in  one  direction,  (p.  86) 

To  the  extent  that  the  extensional  basis  for  a  system's  phenomena  is 
underestimated  and/or  unknown,  the  intentional  characterization  of  the  system 
is  likely  to  be  ungrounded  and  fatuous;  ordinary  systemic  states  get  ascribed 
near  magical  functions  or  powers  (section  below).  And  this  latter  statement 
identifies,  in  a  nutshell,  the  danger  and  inadequacy  of  seeking  an  account  of 
behavior,  as  Walter  advocates,  in  observations  limited  to  brain  states. 

The  Animal-Environment  System  as  the  Appropriate  Unit  of  Analysis 


Walter  focuses  his  attack  on  realism  on  Turvey  and  Carello  (1981).  He 

discusses  the  position  thusly: 

This  position  claims  that  the  joint  situation  of  an  organism  and  its 
environment  is  the  only  correct  fundamental  concept  for  brain/mind 
modeling...!  regard  their  presumption  that  a  state  of  the 
braln-and-envlronraent  nexus  can  be  observed  as  a  fatal  flaw  in  eco¬ 
logical  realism.  In  my  view,  the  state  of  a  mammal's  brain  cannot, 
in  most  situations,  usefully  be  observed. . .without  so  severely 
interfering  with  that  state,  by  your  observing. .. that  the  state  will 
change  in  an  unpredictable  and  uncontrollable  way....  (p.  231) 

Interestingly,  the  word  "brain"  never  appears  in  the  Turvey  and  Carello  manu¬ 


script.  Indeed,  eschewing  brains  as  the  appropriate  entities  to  model  for  an 
understanding  of  psychological  phenomena  is  at  the  heart  of  using  ecological 
to  modify  our  brand  of  realism.  We  are  Interested  in  how  organisms  ( including 
humans)  are  able  to  perceive  their  propertied  environments  in  a  way  that  will 
allow  them  to  behave  effectively  with  respect  to  those  environments.  A 
runner — be  it  human,  gnu,  or  cockroach — does  not  steer  around  representations 
or  brain  states;  it  avoids  real  obstacles  and  goes  through  real  openings. 
Couching  problems  in  such  terms  is  not,  as  Walter  claims,  simply  a 
"programmatic  and  descriptive  phase"  that  ecological  realism  is  going  through. 
The  "dynamically  useful  formulation  of  behavior"  that  Walter  asserts  is  una¬ 
vailable  from  our  strategy  not  only  is  found  in  a  realist  framework  but,  we 
would  argue,  can  only  be  provided  by  such  a  perspective.  One  of  Gibson's 
favorite  examples — the  problem  of  controlled  collisions  in  locomotion — will  be 
used  to  buttress  this  argument. 

As  an  animal  moves  through  a  cluttered  surround,  it  sometimes  steers 
around  objects,  sometimes  contacts  them  gently,  and  sometimes  collides  with 
them  violently.  In  order  to  control  encounters  with  the  environment,  activi¬ 
ty-relevant  (dynamically  useful)  Information  must  be  available.  This  includes 
Information  specific  to  what  is  moving  (e.g. ,  the  animal  or  the  objects  that 
surround  it),  direction  of  locomotion,  obstacles  and  apertures  in  one's  path, 
time  to  contact  (if  it  should  occur),  and  force  of  contact  (if  It  should  oc¬ 
cur).  This  information  has  been  demonstrated  by  a  number  of  investigators 
(e.g.,  E.  Gibson,  1983;  J.  Gibson,  1979;  Lee,  1976,  1980;  Lishman  &  Lee  1973; 
Schiff,  1965)  to  exist  in  what  might  be  termed  the  morphology  of  the  optic 
flow  field  (Kugler,  1983:  Kugler  &  Turvey,  in  press;  Solomon,  Carello,  &  Tur¬ 
vey,  1984).  We  will  highlight  some  of  the  findings  here  but  for  detailed  ana¬ 
lyses,  the  reader  should  refer  to  the  cited  works. 
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Although  the  problem  of  distinguishing  one's  own  movement  from  displace¬ 
ments  of  the  surround  has  been  a  long-standing  puzzle  in  orthodox  accounts  of 
perceiving,  Gibson  (1979)  provided  a  simple  solution,  viz.,  global,  smooth 
change  in  the  optic  array  specifies  egomotion,  local  discontinuous  change 
specifies  motion  of  an  object  in  the  environment.  Moreover,  one's  direction 
of  locomotion  is  also  specified  by  the  form  of  the  optic  flow  field:  Global 
optical  expansion  specifies  forward  movement  (where  the  focus  of  expansion 
specifies  the  point  toward  which  one  is  moving)  while  global  optical  contrac¬ 
tion  specifies  retreat  (where  the  focus  of  contraction  specifies  the  point 
from  which  one  is  moving).  If  the  appropriate  flow  fields  are  generated,  the 
appropriate  actions  will  be  constrained  (e.g.,  in  the  face  of  simulated  global 
optical  expansion,  a  person  will  make  postural  adjustments  backward  to 
compensate  for  the  perceived  forward  movement  [Lishman  &  Lee,  1973];  when  con¬ 
fronted  with  local  optical  expansion,  a  person  [or  animal]  will  duck  [Schiff, 
1965]).  The  same  sort  of  analysis  distinguishes  obstacles  from  apertures:  A 
closed  contour  is  specified  as  an  obstacle  when  there  is  a  loss  of  structure 
outside  the  contour  during  approach;  it  is  specified  as  an  opening  when  there 
is  a  gain  of  structure  inside  the  contour  during  approach  (J.  Gibson,  1979). 
Infants  as  young  as  six  months  will  duck  from  approaching  obstacles  but  try  to 
look  inside  approaching  openings  (E.  Gibson,  1983). 

If  an  animal  wishes  to  steer  around  objects,  it  must  move  in  such  a  way 
that  optical  expansion  is  centered  in  openings  rather  than  obstacles.  In  or¬ 
der  to  contact  objects  (and  to  vary  the  force  with  which  they  are  contacted), 
two  more  optical  flow  properties  are  needed.  The  inverse  of  the  relative  rate 
of  dilation  of  a  topologically  closed  region  of  the  optical  flow  field  (e.g., 
that  structured  by  a  wall)  specifies  the  time  at  which  a  moving  animal  will 
contact  that  region.  The  derivative  of  the  time-to~contact  variable  is  infor¬ 
mation  about  the  imminent  momentum  exchange:  If  it  is  greater  than  a  certain 
critical  value,  the  animal  will  stop  short  of  contact;  if  it  is  equal  to  that 
critical  value,  the  contact  will  be  soft;  if  it  is  less  than  that  critical 
value,  there  will  be  a  momentum  exchange  and  the  contact  will  be  hard  (Kugler, 
Turvey,  Carello,  &  Shaw,  198^;  Lee,  1976,  1980). 

Notice  that  these  properties  do  not  exist  in  the  animal  or  in  the 
environment  but  are  only  def ined  for  the  animal-envlrorment  system.  The  com¬ 
ponents  of  the  system  are  not  ruled  by  the  indeterminacy  that  governs 
conjugate  variables  in  quantum  mechanics.  That  is  to  say,  an  exact  descrip¬ 
tion  of  one  component  does  not  mean  that  the  other  component  cannot  be  deter¬ 
mined.  On  the  contrary,  measuring  one  of  the  components  in  isolation  not  only 
fails  to  provide  an  understanding  of  the  system  but  gives  a  misleading  picture 
of  the  component  that  is  being  measured.  This  is  the  problem  of  overdecompos¬ 
ing  a  partial  system  from  the  total  system  that  includes  it  (Turvey  &  Shaw, 
1979;  cf.  Ashby,  1963;  Humphrey,  1933;  Weiss,  1969).  Although  science 
requires  decomposition  to  a  certain  extent  in  order  to  make  its  problems 
manageable,  the  parsing  of  systems  cannot  be  done  cavalierly.  An  unprincipled 
selection  of  a  system  in  which  a  phenomenon  is  thought  to  reside  may  make  the 
phenomenon  appear  capricious  and  compel  the  scientist  to  attribute  magical 
powers  or  content  to  the  partial  system  (Ashby,  1963;  Turvey  &  Shaw,  1979). 
The  appropriate  grain  of  analysis,  however,  may  reveal  the  law-governed 
determinacy  that  is  unavailable  in  the  partial  system  (Weiss,  1969). 
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For  example,  if  we  take  a  climber-stairway  system  (Warren,  1984)  as  an 
instance  of  an  animal-env i r’onment  system,  several  points  can  be  illustrated. 
First,  there  is  optical  information  for  a  category  boundary  for  ac¬ 
tion — perceivers  can  see  which  of  a  variety  of  stairways  (constructed  with 
risers  of  varying  heights)  are  climbable  in  the  normal  way  (i.e.,  without  us¬ 
ing  hands  or  knees).  Second,  there  is  a  perceptual  preference  for  stairways 
that  would  be  easiest  to  climb  (as  determined  by  measures  of  energy  expendi¬ 
ture  during  climbing).  Third,  both  of  these  relationships  can  be  described  by 
a  method  of  intrinsic  measurement,  in  which  one  part  of  given  system  (e.g.,  on 
the  animal  aide)  acts  as  a  natural  standard  against  which  a  reciprocal  part  of 
the  system  (e.g.,  on  the  environment  side)  can  be  measured  (Warren,  in  press; 
Warren  &  Shaw,  1981;  cf.  Bunge,  1973;  Gibson,  1979).  Thus,  the  critical  riser 
height/leg  ratio,  indexing  the  action  boundary,  is  .89  whereas  the  optimal  ra¬ 
tio,  indexing  minimum  energy  expenditure,  is  .26.  These  ratios  are  the  same 
for  all  climbers,  short  and  tall.  Finally,  each  of  these  ratios  is  a  measure 
of  animal-environment  fit;  each  is  an  index  of  the  state  of  that  system.  No¬ 
tice  that,  unlike  Walter's  quantum  systems,  the  state  does  not  change  by 
measuring  it  and  predictions  are  not  invalidated  by  observations.  For  a  given 
individual,  if  the  ratio  of  riser  height  to  leg  is  less  than  or  equal  to  .89, 
the  stair  will  be  cllmbab'.e;  if  the  ratio  equals  .26,  that  stair  will  be  (rel¬ 
atively)  energetically  cheap  to  climb.  Those  relationships  do  not  change. 
And  nowhere  in  this  analysis  is  it  suggested  that  brain  states  can  be  or  ought 
to  be  observed. 

Brainstates  Are  Not  the  Touchstone  for  Theories  of  Knowing 

Walter  would  not  deny  that  behaviors  like  stairclimbing  are  observable 
without  interference  from  the  observer  but  he  would,  no  doubt,  claim  that  they 
are  not  useful  or  worthwhile  to  model. 

I  have  (Walter,  1980)  characterized  those  aspects  of  behavior  that 
are  predictable  from  less  severely  interfering  observations,  as 
rather  gross  and  phy sicalistic  (contrasted  with  "psychodynamic"); 
they  seem  to  obey  a  correspondence  principle  or  classical  limit. 

They  also  tend  toward  conspiring  to  give  a  systematically  misleading 
impression. .. that  they  are  a  closed  system,  adequate  to  describe  the 
brain,  (pp.  231-232) 

Though  "gross"  may  be  used  pejoratively,  perceiving  and  acting  are  unabashedly 
macrophenoraena.  Walter's  implication  that  the  only  Interesting  behavior  is  a 
microbehavior  will  sever  him  from  consideration  of  a  gannet's  dive  for  a  fish 
(Lee  &  Reddish,  1981),  the  baseball  fielder’s  catch  of  a  deep  fly  ball  (Solo¬ 
mon,  Carello,  &  Turvey,  1984;  Todd,  1981),  and  his  own  efforts  to  avoid 
destruction  on  the  San  Diego  Freeway  (Gibson  &  Crooks,  1938).  While 
microphenomena  may  have  their  place,  that  place  is  not  a  privileged  one.  They 
need  not  and  will  not  serve  all  of  science.  Once  again,  this  attitude  is  not 
idiosyncratic  to  ecological  realists.  Rosen  (1978),  for  example,  in  stressing 
the  functional  and  organizational  character  of  certain  physical  systems,  ob¬ 
served: 

What  seemed  to  be  emerging  from  such  considerations  was  apparently 
the  antithesis  of  the  reductionist  program:  instead  of  a  single 
ultimate  set  of  analytic  units  sufficient  for  the  resolution  of  any 
problem,  we  find  that  distinct  kinds  of  Interactions  between  systems 
determine  new  classes  of  analytic  units,  or  subsystems,  that  are  ap- 
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propriate  to  the  study  of  that  interaction.  (p.  xvi) 

[These]  families  of  analytic  units,  all  of  which  are  equally  "real" 

[are]  entitled  to  be  treated  on  the  same  footing;  the  appropriate 
use  of  natural  interactions  can  enormously  extend  the  class  of  phy¬ 
sical  observables  [italics  added]  accessible  to  us....  (p.  xvii) 

Once  again  we  see  the  theme  of  appropriate  levels  of  reality,  this  time 
directed  at  the  question  of  what  counts  as  an  observable  for  physics. 

We  suspect  that  Walter  would  not  be  sympathetic  to  the  above  line  of 
argument,  countering  that  we  ought  to  focus  on  what  qualifies  as  a  legitimate 
observable  for  psychology,  instead  of  physics,  for  problems  of  knowing.  This 
is  apparent  in  his  contrasting  "physicalistic"  with  "psychodynamic"  aspects  of 
behavior,  charging  that  the  former  are  not  "adequate  to  describe  the  brain." 
This  is  where  his  emphasis  on  vague  states  of  human  mind  during  thinking, 
rumination,  and  the  like  clashes  most  dramatically  with  our  concern  for  the 
very  unvague  states  of  animal-environment  systems  during  perceiving  and  act¬ 
ing.  In  his  desire  to  understand  brain  (as  the  seat  of  mind),  Walter  holds 
thinking  and,  in  particular,  vague  thinking  as  the  focus  of  any  theory  of 
epistemic  agents.  But  for  us,  reliable  and  reproducible  behaviors  must  be  the 
touchstone  for  any  account  of  knowing.  In  infinitely  varying  settings, 
organisms  are  able  to  produce  the  same  appropriate  behavior  consistently, 
adapting  it  to  the  particular  circumstances.  For  example,  countless  times  a 
day  a  bird  will  take  off  from  a  variety  of  surfaces  of  support  at  a  wide  range 
of  heights  and  fly  toward  other  surfaces  of  support  at  varying  distances  away, 
alighting  on  them  gently.  Sometimes  it  will  steer  around  trees  or  pet  cats 
and  sometimes  it  will  have  a  direct  flight.  Obstacles  to  and  paths  for 
locomotion  and  the  appropriateness  of  accelerations  and  decelerations  can  be 
neither  indistinctly  specified  in  optical  flow  fields  nor  unreliably  detected 
if  the  bird  is  to  loccmote  through  its  cluttered  terrain  successfully.  It  is 
these  kinds  of  behaviors,  not  indeterminate  contemplations,  that  should  pro¬ 
vide  the  standard  against  which  to  Judge  the  adequacy  of  theories  of  knowing. 

The  example  of  a  bird  in  flight  is  an  important  one  because  it  contains 
one  feature — collisions  with  plate  glass  windows — of  the  sort  that  Walter, 
among  others,  uses  to  try  to  refute  realism.  The  style  of  the  argument  can  be 
characterized  as  follows:  A  bird  who  sees  the  window  as  an  opening  and  flies 
into  it  has  not  perceived  reality  correctly  and  has  not  acted  effectively. 
But  in  situations  of  so-called  perceptual  "mistakes,"  we  embrace  the  distinc¬ 
tion  drawn  by  Lewis  (1929) — ignorance  of  reality  is  not  to  be  equated  with 
erroneous  knowledge  of  reality.  A  window  does  not  structure  the  optic  array 
at  all  points  of  observation  so  as  to  specify  the  substantiality  of  the  trans¬ 
parent  surface.  The  bird  is  ignorant  of  that  aspect  of  reality  because  infor¬ 
mation  about  that  aspect  is  not  available  to  those  points  of  observation  along 
the  bird's  approach.  Information  about  substantiality  is  available,  however, 
to  other  points  of  observation,  viz.,  on  those  paths  where  the  optic  array  is 
structured  by  more  reflective  angles  of  the  glass.  When  information  about  an 
obstacle  to  locomotion  is  not  available,  a  bird  will  not  change  its  path  of 
locomotion.  Perception  in  the  first  case  is  veridical;  perception  in  the  sec¬ 
ond  case  is  "veridical  but  partial"  (Lewis,  1929,  p.  176). 

A  Final  Note 

The  ecological  approach  addresses  common  behaviors  under  the  general  ru¬ 
bric  of  controlled  collisions  (Kugler  et  al.,  198M)  or  controlled  encounters 
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(Gibson,  1979).  Such  behaviors  cut  across  species  and  allow  us  to  highlight 
the  very  small  number  of  design  principles  responsible  for  the  wide  range  of 
activities  that  nervous  systems  support.  While  the  processes  that  thinkers  go 
through  in  conceiving  and  refining  their  ideas  are  intriguing,  they  should  not 
provide  the  starting  point  for  an  explanation  of  perception  in  the  service  of 
activity.  Putting  them  at  the  forefront  of  things  to  be  explained  is  an 
apotheosis  of  the  exotic  and  likely  to  be  premature.  As  a  parallel,  consider 
the  rainbow,  which  has  fascinated  philosophers  and  scientists  for  centuries. 
An  adequate  quantitative  theory  that  accounts  for  all  of  the  features  and 
quirks  of  that  phenomenon  awaited  the  development  of  geometrical  optics,  and 
an  understanding  of  the  wave  and  particle-like  properties  of  light,  polariza¬ 
tion,  and  the  complex  angular  momentum  method  (Nussenzveig,  1977).  We  may 
have  to  be  similarly  thorough  in  uncovering  those  fundamental  principles  at 
the  ecological  scale  on  which  the  reliable  and  reproducible  behaviors  of 
epistemic  agents  are  based  and  on  which  an  acceptable  account  of  thinking  will 
rest. 
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THE  INFORMATIONAL  SUPPORT  FOR  UPRIGHT  STANCE* 


Claudia  Carello.t  M.  T.  Turvey.tt  and  Peter  N.  Kuglerttt 


Nashner  and  McCollum  suggest  that  (1)  perturbations  of  the  body  relative 
to  the  gravitational  field  and  the  surface  of  support  parse  into  a  small  num¬ 
ber  of  circumscribed  kinetic  states  (regions  of  disequilibrium),  and  (2)  a 
functional  muscular  organization,  to  restore  upright  posture,  corresponds  to 
each  state.  Though  the  authors  talk  about  the  sensing  of  these  states,  they 
give  no  indication  of  the  relevant  information.  In  a  related  way,  we  think, 
their  references  to  neural  signals  that  require  interpretation,  their  appeals 
to  memory  (presumably  of  previous  trajectories,  previous  initial  conditions, 
previous  sensory  consequences,  and  previous  postural  achievements),  and  their 
supposition  of  anatomically  defined  senses  uniquely  tied  to  distinct  frames  of 
reference  seem  to  run  counter  to  the  general  Bernsteinian  (1967)  strategy  that 
they  are  pursuing,  that  is,  compressing  in  a  principled  fashion  a  movement 
problem  of  potentially  very  many  degrees  of  freedom  into  a  movement  problem  of 
very  few  degrees  of  freedom. 

In  contrast,  we  are  inclined  strongly  toward  Gibson's  (1966,  1979)  revi¬ 
sion  of  the  senses  in  terms  of  perceptual  systems — active,  interrelated  sys¬ 
tems  (as  opposed  to  senses)  that  detect  information  (rather  than  have  sensa¬ 
tions)  about  the  perceiver-environment  relation  (rather  than  about  their  own 
states).  Taking  a  Gibsonian  stance,  we  ask  whether  there  could  be  information 
specific  to  a  circumscribed  disequilibrium  state,  regardless  of  etiology; 
whether  there  could  be  information  specific  to  approaching  a  region’s  bound¬ 
ary,  regardless  of  the  details  of  the  trajectory;  and  whether  such  information 
can  be  independent  of  the  mode  of  attention.  We  will  start  with  Gibson's 
strict  interpretation  of  information  with  respect  to  vision,  demonstrate  that 
equivalent  Information  is  obtainable  by  other  pe’^ceptual  systems,  and  conclude 
with  speculation  about  properties  that  might  generalize  to  the  control  of 
stance. 

Information  is  optical  structure  lawfully  generated  by  the  persistent  and 
changing  layout  of  surfaces  and  by  the  displacements  of  the  body  (as  a  unit 
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relative  to  the  surface  layout  and  as  parts  relative  to  each  other).  Because 
the  properties  of  the  optic  flow  field  are  lawfully  related  to  the  properties 
of  the  kinetic  field  underlying  them,  they  are  said  to  spec ify  those  kinetic 
properties  (see  Runeson  &  Frykholm,  1983).  Following  Lee  (1978),  the  optical 
flow  field  is  ex terospec if ic  (specific  to  properties  of  surface  layout), 
expropriospec if ic  (specific  to  the  orientational  displacements  of  the  point  of 
observation  relative  to  the  surface  layout),  and  propriospec if ic  (specific  to 
the  relations  among  the  parts  of  the  body).  And  it  can  be  specific  in  each  of 
these  ways  simultaneously.  How  can  this  be?  Each  class  of  facts  (extero, 
exproprio,  proprio)  imposes  a  distinct  patterning — or  structure,  or  form,  or 
morphology  (see  Kugler  &  Turvey,  in  press) — on  the  optical  flow  field.  These 
patternings  are  superposed  on  each  other  but  differentiable  from  one  another. 

Consider  one  such  patterning.  An  optical  flow  field  can  be  treated  as, 
roughly,  a  velocity  vector  field  where  the  vectors  represent  angular 
velocities  of  the  optical  elements  (see  Gibson,  1979).  When  all  vectors  are 
undergoing  a  graduated  magnification  about  a  fixed  point,  then  the  point  of 
observation  is  displacing  rectil inear ly  toward  the  fixed  point.  It  is  sug¬ 
gested  that  any  globally  smooth  velocity  vector  field  specifies  a  displacement 
of  the  point  of  observation.  (Note  that  the  qualitative  macroscopic  proper¬ 
ties  of  the  field  are  what  matter,  not  the  individual  vectors.)  One  can 
sketch  a  law  at  the  ecological  scale  (see  Turvey,  Shaw,  Reed,  &  Mace,  1981) 
roughly  of  the  form : 


displacement  of  point  LAWFULLY  GENERATES  globally  smooth  velocity 

of  observation  - >  vector  field 


This  law  defines  a  particular  kind  of  information  in  Gibson’s  specif icational 
sense,  that  is, 

globally  smooth  velocity  SPECIFIES  displacement  of  point  of 

vector  field  - >  c^servation  relative  to  surround 


Note  that  the  optical  property  in  the  foregoing  law  is  a  kinematic 
abstraction  (dimensions:  length  and  time)  of  an  energy  distribution  (light) 
structured  by  properties  of  a  kinetic  field  (dimensions:  mass,  length,  and 
time),  that  is,  the  field  determined  by  the  animal  and  surface  layout.  Inso¬ 
far  as  the  same  kinematic  abstraction  could  be  supported  by  other  energy 
distributions  modulated  by  the  same  kinetic  facts,  this  analysis  can  be  gener¬ 
alized  to  other  modes  of  attention.  For  example,  if  a  sound  field  with  the 
same  globally  smooth  morphology  could  be  produced,  according  to  Gibson's 
law-based/spec if Icational  interpretation  of  information,  listeners  should  per¬ 
ceive  themselves  displacing  relative  to  the  surroundings  (for  confirming  evi¬ 
dence,  see  Dodge,  1923;  Lackner,  1977).  Defining  this  morphology  over  defor¬ 
mations  of  the  skin  should  yield  the  same  impression  of  egomotion  (again  see 
Lackner,  1977). 

This  treatment  of  expropriospec if ication  can  be  extended  to  extero-  and 
propriospec if ication.  It  is  suggested  that  distinct  flow  morphologies,  now 
discontinuous  rather  than  smooth,  specify  facts  of  surface  layout  and  rela¬ 
tions  among  Joints  (Gibson,  1966,  1979).  Again,  these  morphologies  can  be  in- 


Carello  et  al.:  The  Informational  Support  for  Upright  Stance 


stanced  by  different  kinds  of  energy  distributions.  Note  that  is  possible  to 
des'.-'ibe  vestibular  sti::.ulaticn — weights  displacing  in  fluid-filled  chambers 
rei^^ive  t'-  .-'’ivity's  pull--and  haptic-somatic  stimulation — nonrigid  mechani¬ 
cal  defor.  i  -,0  of  the  body's  tissues — as  kinematic  or  vector  fields.  And 
note  fu;’tr.er  ihs.,  in  principle,  these  velocity  vector  fields  are  characteriz- 
able  alternatively  as  low-d irr.ensional ,  macroscopic  patternings.  According  to 
tne  ecological  law  formulation  from  above,  if  a  given  disequilibrium  state 
gives  rise  to  identical  morphologies  in  the  vector  fields  that  are  "attended 
to"  vestibularly ,  haptically,  and  visually,  then  the  same  postural  fact  will 
be  apprehended  by  each  mode  of  attention. 

Nashner  and  McCollum  are  puzzled  by  neural  signals  having  equivalent 
postural  consequences  when  the  signals  are  different.  In  our  view,  their  puz¬ 
zlement  is  based  on  the  wrong  formulation:  Information  may  be  identical  when 
neural  signals,  stimuli,  etc.,  are  different  (see  Gibson,  1966,  p.  55). 
Ntshner  and  McCollum  feel  that  neural  signals  must  be  interpreted.  Signal  is 
a  ..'etaphor  for  sensations,  and  sensations  strictly  speaking  can  only  be  about 
stat'^s  of  nerves;  hence  the  need  for  interpretation.  Again,  their  formula  is 
suspect.  Information  is  about,  in  the  sense  of  specific  to,  animal-environ¬ 
ment  facts.  It  needs  to  be  detected,  and  its  differentiation  and  pick  up  by  a 
perceptual  system  improve  with  practice,  but  to  interpret  it  would  be 
superfluous. 

We  have  suggested  that  the  information  about  kinetic  conditions  (such  as 
regions  of  postural  equilibrium)  is  to  be  found  in  the  morphology  of  kinematic 
fields.  Moreover,  the  information  is  indifferent  to  the  medium  that  has  been 
structured  kinematically.  We  conclude  with  a  speculation  about  the  morpholog¬ 
ical  property  specific  to  approaching  a  region's  boundary — a  generalization  of 
the  time-to-contact  variable,  T,  and  its  derivative  (Lee,  1980). 

For  the  visual  system,  T  is  the  inverse  of  the  relative  rate  of  dilation 
of,  roughly,  the  optic  array.  It  specifies  when  one  will  contact  a  surface  on 
the  path  of  locomotion.  Its  derivative  specifies  how  hard  the  imminent  colli¬ 
sion  will  be  (Lee,  1980).  Our  conjecture  is  that  T  may  be  a  very  general 
property  of  kinematic  (flow)  fields.  Any  kinetic  field  will  have,  as  a  rule, 
the  equivalents  of  contactable  "surfaces";  for  example,  attractors,  basins, 
etc.  Is  there,  as  a  rule,  the  equivalent  of  T  in  the  kinematic  abstraction  of 
any  kinetic  field — for  example,  nonrigid  mechanical  distortions  of  body 
tissues?  Suppose  that  the  authors'  regions  of  equilibrium  are  detected  hapti- 
calxy.  Then  the  proposed  availability  of  T  and  its  derivative  would  provide  a 
principled  haptic  basis  for  regulating  forces  to  prohibit  crossing  regions. 

In  Sum,  Gibson's  treatment  of  information  seems  relevant  to  Nashner  and 
McCollum  in  this  sense:  The  low  dimensionality  of  postural  control  they  prom¬ 
ise  on  the  side  of  action  could  be  reciprocated  (as  it  must)  on  the  side  of 
perception. 
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DETERMINING  THE  EXTENT  OE  COARTICULATION:  EFFECTS  OF  EXPERIMENTAL  DESIGN* 


Car  le  E.  Gelfv-r.t  Fredericka  Bell-Berti ,  tt  and  Katherine  S.  Harrist 


Abstract.  Substantial  differences  in  the  reports  of  the  extent  of 
anticipatory  coarticulation  have  made  the  task  of  deciding  among 
unifying  models  of  the  process  difficult.  Two  conceptuallly  dis¬ 
tinct  groups  of  theories  of  coarticulation  have  emerged,  one  posit¬ 
ing  the  migration  of  articulatory  features  to  preceding  segments  and 
the  other  positing  the  temporal  cohesiveness  of  the  components  of 
segmental  articulations.  In  studies  of  anticipatory  lip  rounding,  a 
possible  source  of  the  differences  reported  in  its  extent  prior  to  a 
rounded  vowel  is  that  the  alveolar  consonants  commonly  employed  in 
these  studies  are  presumed  to  be  unspecified  with  regard  to  lip 
configuration.  Thus,  the  presence  of  EMC  activity  and/or  protrusive 
lip  movement  during  these  consonants  has  been  presumed  to  indicate 
vocalically  conditioned  lip  activity.  However,  if  this  activity  is 
directly  related  to  the  production  of  the  consonant( s) ,  then  the 
interpretation  of  these  results  is  problematic  unless  the  experimen¬ 
tal  design  allows  for  the  differentiation  of  consonantal  and  vocalic 
effects.  We  offer  here  both  data  suggesting  the  need  for  such 
considerations  and  a  paradigm  that  takes  these  considerations  into 
account. 


Introduction 

The  phenomena  of  anticipatory  coarticulation  have  generally  been  presumed 
to  reflect  underlying  aspects  of  speech  motor  control  (e.g.,  Kozhevnikov  & 
Chistovich,  1966;  MacNeilage,  1970).“  However,  substantial  differences  in  re¬ 
ports  of  the  extent  of  anticipatory  coarticulation  make  difficult  the  task  of 
providing  one  model  to  account  for  these  data.  Two  types  of  conceptually  dis¬ 
tinct  theories  of  anticipatory  coarticulation  exist,  both  of  which  attempt  to 
explain  the  apparently  nondiscrete  nature  of  speech  output  despite  a  presumed 
discrete  input.  According  to  one  type  of  theory,  upcoming  phones  are  scanned 
for  salient  features,  which  then  migrate  to  as  many  antecedent  phones  as  are 
neutral  for,  or  in  no  way  antagonistic  to,  the  migrating  feature  (e.g.,  Dani- 
loff  &  Moll,  1968;  Henke,  1966;  Kozhevnikov  &  Chistovich,  1966;  Sussman  & 
Westbury,  1981).  Thus,  given  some  number  of  consonants  unspecified  for  lip 
configuration  immediately  preceding  a  rounded  vowel,  these  models  predict  that 
rounding  will  vary  in  its  onset  in  direct  proportion  to  the  number  and/or 


*A  version  of  the  paper  was  presented  at  the  103rd  Meeting  of  the  Acoustical 
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duration  of  preceding  segments.  For  example,  Benguerel  and  Cowan  (197iJ) 
reported  that  upper  lip  protrusion  (in  anticipation  of  a  rounded  vowel)  begins 
as  early  as  the  first  consonant  in  clusters  of  as  many  as  six  consonants. 
However,  the  second  type  of  theory  proposes  that  the  observed  co-occurrence  of 
components  of  proximate  segments  results,  not  from  feature  migration,  but  from 
the  overlapping  of  articulatory  components  of  those  segments  (e.g.,  Bell-Berti 
&  Harris,  1981,  1982;  Fowler,  1980).  Thus,  in  the  absence  of  conflicting  de¬ 
mands,  the  onsets  of  different  components  of  the  articulation  of  a  given  phone 
will  bear  a  stable  temporal  relationship  to  each  other.  For  example,  Engs- 
trand  (1981)  reported  that  lip  protrusion  activity  for  the  rounded  vowel  /u/ 
occurs  at  a  relatively  fixed  time  before  the  onset  of  voicing  for  that  vowel, 
regardless  of  the  number  of  preceding  consonants. 

Despite  their  conceptual  differences,  however,  a  basic  premise,  having 
its  roots  in  traditional  linear  generative  phonology,  is  common  to  these 
models:  namely,  that  a  phone  is  neutral  (i.e.,  unspecified)  for  a  particular 
feature  when  that  feature  is  not  essential  to  its  realization  (Chomsky  & 
Halle,  1968,  pp.  1)02-903).  Consequently,  when  activity  associated  with  a  giv¬ 
en  feature  occurs  during  a  segment  that  is  "neutral"  for  that  feature,  that 
activity  must  be  associated  with  another  segment,  and  the  time  at  which  this 
activity  begins  is  then  assumed  to  reflect  the  extent  of  anticipatory 
coarticulation.  In  fact,  however,  it  may  be  that  feature  descriptions  are 
incomplete.  For  example,  as  Benguerel  and  Cowan  (1974)  have  noted,  American 
English  /r/  is  commonly  produced  with  lip  protrusion,  although  this  protrusion 
often  goes  unmentioned  in  articulatory  descriptions  of  /r/. 

Upon  closer  consideration,  it  would  appear  that  many  of  the  differences 
in  the  existing  literature  might  be  reconciled,  and  thus  allow  the  development 
of  a  single  explanation  for  them,  were  these  assumptions  reconsidered.  The 
work  presented  here  is  part  of  a  study  designed  to  account  for  the  conflicting 
results  of  previous  studies,  and  therefore  to  test  the  predictions  of  the  dif¬ 
ferent  models  of  anticipatory  coarticulation. 

Methods 

The  alveolar  consonants  /t/  and  /s/,  whose  articulation  would  be  presumed 
to  be  neutral  for  lip  constriction,  were  combined  to  form  nine  sequences  de¬ 
signed  to  vary  both  in  the  number  of  consonants  and  in  overall  sequence  dura¬ 
tions.*  The  vowels  in  these  utterances  were  /i/  and  /u/,  where  V,  was  always 
/i/,  while  Vj  was  either  /i/  or  /u/.  Thus,  there  were  two  vowel  conditions, 
the  /iC  u/  and  /iC  1/  conditions,  each  occurring  with  the  nine  different 
consonant  string  combinations,  for  a  total  of  eighteen  utterance  types  (Table 

1).  The  sequences  were  made  by  combining  "words,"  and  were  presented  to  the 

subjects  in  orthographic  writing.  The  subjects  were  instructed  to  speak  at  a 
comfortable  rate,  in  a  conversational  manner,  without  undue  attention  to  mark¬ 
ing  word  boundaries.  Thus,  the  subjects  could,  and  did,  differ  in  the  way  in 
which  they  executed  a  given  sequence  (for  example,  leased  tool  (/list#tul/) 
was  often  realized  as  the  sequence  [list:ul]).  Two  native  speakers  of  Ameri¬ 
can  English*  produced  between  fifteen  and  twenty  repetitions  of  each  of  the 

eighteen  VC^vs,  spoken  within  the  carrier  phrase  "It*s  a  _  again." 

Surface  electromyographic  (EMG)  recordings  (Allen,  Lubker,  &  Harrison, 
1972)  of  orbicularis  oris  inferior  (001),  right  and  left,  were  made  simultane¬ 
ously  with  lip  movement  recording.  Lip  movements  were  tracked  with  an 
optoelectrical  tracking  system  (Capstan  Co.  Model  400  Optical  Tracking  System) 
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that  sensed  the  position,  in  both  the  x  and  y  planes,  of  an  infrared 
light-emitting  diode  (LED)  positioned  on  the  lower  lip.  All  data  were 
simultaneously  recorded  on  a  1i)-channel  FM  tape. 


The  EMG  signals  were  rectified,  and  both  the  EMG  and  movement  data  were 
integrated  and  then  digitized  using  a  PDF  11/45  computer.  The  durations  of 
the  consonant  strings  were  measured  for  each  token  of  each  utterance  type,  us¬ 
ing  a  PCM  waveform-editing  program.  The  beginning  of  the  consonant  string  was 
defined  as  the  point  at  which  either  the  frication  appeared  in  the  waveform 
(in  consonant  strings  beginning  with  /s/),  or  the  higher  formants  disappeared 
from  the  waveform  (Indicating  the  onset  of  closure  in  consonant  strings  begin¬ 
ning  with  /t/).  The  point  in  the  acoustic  signal  corresponding  to  the  release 
of  the  consonant  occlusion  immediately  preceding  Vj  was  identified  as  the  end 
of  the  consonant  string  and  served  as  the  acoustic  reference,  or  line-up, 
point  for  subsequent  ensemble  averaging.  Thus,  when  V.  was  preceded  by  /t/. 
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the  line-up  point  was  the  burst;  when  Vj  was  preceded  by  /s/,  the  line-up 
point  was  the  end  of  frication  before  the  second  vowel. 

The  beginning  of  001  activity  associated  with  the  /VC^V/  sequences  was 
determined  by  identifying  the  time  at  which  the  EMC  activity  increased  to  a 
level  equivalent  to  the  baseline  plus  five  percent  of  the  difference  between 
the  baseline  and  the  peak  EMC  levels.  The  beginning  of  the  related  movement 
was  determined  by  identifying  the  onset  of  anteriorly-directed  lip  movement. 

Results 

Some  representative  EMC  data  are  shown  for  each  subject  (Figure  la).  The 
EMG  signals  in  each  panel  represent  the  ensemble  average  001  EMG  activity  of 
an  /iC^u/  utterance,  with  consonant  string  length  (i.e. ,  both  the  number  of 
segments  and  the  durations  of  the  sequences)  differing  across  panels.  The  on¬ 
set  of  EMG  activity  occurs  earlier  as  consonant  string  duration  increases,  so 
that  it  would  appear  that  there  has  been  a  migration  of  lip  rounding  back  to 
the  beginning  of  the  consonant  string.  In  fact,  when  the  onset  of  001  EMG 
activity  for  each  of  the  nine  /iC^u/  utterances  is  plotted  against  the  re¬ 
spective  consonant  string  durations  (Figure  1b),  it  seems  that,  for  both  sub¬ 
jects,  these  onsets  bear  an  obvious  relationship  to  consonant  string  duration. 
That  is,  they  occur  earlier  as  string  duration  increases,  with  correlation 
coefficients  of  r«.98  and  .97  for  TB  and  CH,  respectively. 

Although  these  results  might  be  interpreted  as  evidence  that  lip  rounding 
has  spread  to  the  beginning  of  the  "neutral"  consonant  string,  we  believe  that 
it  is  imperative  to  determine  whether  all  of  the  EMG  activity  is  actually 
vowel-related  or,  alternatively,  if  it  reflects  consonantal  lip  gestures.  In 
other  words,  if  the  CX)I  activity  during  the  consonant  string  is  vowel- re la ted, 
we  would  not  expect  to  find  such  activity  during  the  same  consonant  string 
when  it  is  followed  by  an  unrounded  vowel.  We  therefore  examined  001  activity 
for  the  minimally  contrastive  /IC^i/  utterances,  samples  of  which  are  shown 
in  Figure  2a.  It  is  clear  that,  even  within  this  unrounded  vowel  environment, 
there  is  a  significant  amount  of  orbicularis  oris  activity  during  the  conso¬ 
nant  string  articulation.  In  fact,  if  we  treat  these  /iC  i/  data  as  we  did 
those  for  the  /iC  u/  utterances,  identifying  the  onset  of  QiG  activity  for 
each  utterance  and  plotting  these  times  against  consonant  string  durations 
(Figure  2b),  the  resulting  scatter  plots  are  strikingly  similar  to  those  for 
the  /iC^u/  utterance  set  (Figure  1b).  That  is,  001  activity  begins  earlier 
as  consonant  string  duration  Increases.  (Subject  CH  produced  only  eight  of 
the  nine  /iCj^i/  utterances.)  Obviously,  then,  this  EMG  activity  cannot  re¬ 
flect  the  onset  of  vowel-related  lip  rounding  (i.e.,  the  migration  of  the 
vowel  feature)  since  the  relationship  between  consonant  string  duration  and 
the  onset  of  001  activity  is  observed  in  both  rounded  and  unrounded  vowel 
environments.  Indeed,  correlation  coefficients  are  as  high  or  higher  for 
these  ''iC  1/  utterances  (r-.98  and  .99  for  TB  and  CH,  respectively)  than 
they  are  ror  their  rounded  counterparts. 

It  is  obvious,  then,  that  the  progressively  earlier  EMG  activity  must  re¬ 
flect  consonant-related  events.  This  is  made  more  apparent  when  the  EMG 
curves  for  the  minimally  contrastive  /IC^u/  and  /iCj^i/  utterances  are  su¬ 
perimposed  (Figure  3).  The  two  signals  diverge  in  the  vicinity  of  the  acous¬ 
tic  onset  of  Vj ,  with  a  second  peak  of  activity  evident  when  Vj  is  /u/,  while 
EMG  activity  is  suppressed  when  Vj  is  /i/.  However,  because  the  EMG  signal 
never  returns  to  a  baseline  level  prior  to  /u/,  the  onset  of  the  /u/-related 
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Figure  1.  Upper  panels  (la);  Ensemble-average  EMG  data  for  subjects  TB 
(left)  and  CH  (right)  recorded  from  orbicularis  oris  inferior  (001) 
for  three  /IC  u/  utterances.  Lower  panels  (lb):  EMG  onset  time 
(ms  before  line-up  point)  vs.  consonant  string  duration  for 
/iC  u/  utterances.  Time  0  represents  the  release  of  the  conso- 
nanl  occlusion,  determined  from  the  acoustic  waveform.  The  arrows 
indicate  the  average  of  the  acoustic  onsets  of  the  consonant 
strings.  23 
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EMC  activity  was  determined  statistically  as  the  time  at  which  the  difference 
(in  microvolts)  following  the  divergence  of  the  two  signals  reached  signifi¬ 
cance  (£<.05 ). 

The  statistically  determined  onsets  of  rounded  vowel  activity  are  plotted 
as  a  function  of  consonant  string  duration  for  the  nine  minimal  pairs  for  sub¬ 
ject  TB,  and  for  eight  minimal  pairs  for  subject  CH  (Figure  4).  In  contrast 
to  the  consonant-related  EMC  activity  (see  Figures  1  and  2),  these  onsets  bear 
no  obvious  relation  to  the  durations  of  the  consonant  strings.**  Rather,  with 
the  exception  of  the  /i#tu/  utterance,  they  occur  within  a  fairly  restricted 
range,  bearing  a  stronger  relationship  to  the  onset  of  the  rounded  vowel  than 
to  the  onset  of  the  consonant  string. 

The  EMG  data  thus  show  the  following;  First,  for  these  two  subjects, 
some  lip  activity  appears  to  be  Inherent  in  the  production  of  alveolar  conso¬ 
nants.  Second,  the  onset  of  EMG  activity  for  /u/  appears  to  be  related  to  the 
acoustic  onset  of  that  vowel,  and  not  to  the  compatibility  of  the  vowel  and 
consonant  articulations.  Finally,  even  when  there  is  lip  activity  for  adja¬ 
cent  consonants  and  vowels,  they  appear  to  be  organized  as  independent  ges¬ 
tures,  as  the  separate  peaks  of  001  activity  for  the  /iC^u/  utterances  sug¬ 
gest. 

Figure  5  shows  movement  data  for  both  subjects,  for  the  same  /iC^u/ 
utterances  whose  EMG  data  are  presented  above  (Figure  la).  For  TB,  the  data 
show  a  substantial  forward  lip  movement  in  the  vicinity  of  the  acoustic  onset 
of  the  consonant  string,  a  position  that  is  then  sustained  through  Vj .  Howev¬ 
er,  while  there  is  a  less  obvious  separation  between  the  consonant  and  vowel 
gestures  in  the  movement  than  in  the  EMG  records,  there  are  troughs  in  the 
movement  traces  for  all  but  the  shortest  utterance.*  For  subject  CH,  the 
anterior  lip  movement  associated  with  the  rounded  vowel  is  more  clearly 
separated  from  the  anterior  movement  occurring  earlier  in  the  utterance. 

When  the  movement  traces  for  the  /iC^u/  and  /iC  i/  utterances  are  su¬ 
perimposed  (Figure  6),  the  pattern  is  the  same  as  that  for  the  EMG  records. 
That  is,  regardless  of  the  identity  of  ,  the  curves  are  nearly  identical 
through  the  consonant  string,  diverging  in  the  vicinity  of  the  onset  of  the 
second  vowel.  However,  because  of  hardware  limitations  at  the  time  of  record¬ 
ing,  the  baselines  for  these  data  are  not  always  aligned;*  for  this  reason  we 
were  unable  to  determine  statistically  the  times  at  which  each  minimally 
contrastive  pair  differed,  as  we  had  done  for  the  EMG  data.  Furthermore,  when 
the  temporal  relationships  between  the  consonant- related  EMG  and  the  earliest 
anteriorly  directed  movements  are  examined,  there  are  clearly  differences  for 
the  two  subjects.  For  subject  TB,  the  earlier  onset  of  001  activity  is 
associated  with  consonant-related  forward  lip  movement.  That  is,  there  is  an 
appropriate  contraction  time  interval  between  the  EMG  and  corresponding  move¬ 
ment  (Figure  7a).  For  subject  CH,  however,  the  earlier  001  activity  is  not 
associated  with  any  significant  anterior  lip  movement  for  the  consonant  string 
(Figure  7b).  Rather,  this  movement  is  associated  with  the  first  vowel. 

We  are  therefore  faced  with  the  question  of  what  the  consonant-related 
EMG  activity  means  in  terms  of  movement  for  subject  CH.  Figure  8  shows  001 
activity  for  the  three  representative  /iC^^u/  utterances,  along  with  both  the 
corresponding  horizontal  and  vertical  movement  traces.  It  can  be  seen  that, 
while  the  EMG  and  horizontal  lip  movements  are  poorly  correlated  in  the  vicin¬ 
ity  of  the  consonant  string,  there  is  a  good  temporal  correlation  (i.e.. 
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).  Antero-posterior  lip  position  as  a  function  of  time  for  the  two 
subjects  for  three  /iC^u/  utterances.  The  arrows  indicate  the 
average  of  the  acoustic  onsets  of  the  consonant  strings. 
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Figure  6.  Antero-posterior  lip  position  for  both  subjects  as  a  function  of 
time  for  three  minimally  contrastive  pairs  of  /iC  V/  utterances. 
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contraction  interval)  between  the  consonant-related  EMC  and  vert ical  lip  move¬ 
ment.  Thus,  for  this  subject,  the  same  muscle  appears  to  be  contributing  to 
both  vertical  movement  (in  the  production  of  the  consonant  string)  and 
horizontal  movement  (in  the  production  of  the  vowel),  differences  in  orbicu¬ 
laris  oris  function  that  have  been  noted  previously  (cf.  O'Dwyer,  Quinn,  Guit¬ 
ar,  Andrews,  &  Neilson,  1981). 

Discussion 

The  data  offered  here  suggest  that  there  are  a  number  of  reasons  for  the 
difficulty  in  reconciling  the  differences  between  sets  of  previously  reported 
data  on  the  extent  of  anticipatory  coarticulation.  One  of  these  reasons  re¬ 
sides  in  the  unproven  assumptions  that,  if  a  speech  sound's  articulation  has 
not  been  described  as  including  a  particular  gesture,  then,  first,  that  ges¬ 
ture  has  little,  if  any,  consequence  for  the  production  of  the  sound  and,  sec¬ 
ond,  that  speech  sound  is  "unspecified"  for  that  gesture/feature.  However, 
phoneticians  have  long  known  that  the  description  of  the  articulation  of 
speech  sounds  is  incomplete  (cf.  Pike,  19*43,  p.  152);  our  data  clearly  indi¬ 
cate  that,  for  some  speakers  at  least,  some  alveolar  consonants  traditionally 
assumed  to  have  no  intrinsic  lip  gestures  do  in  fact  have  such  gestures  as 
part  of  their  natural  production.  Thus,  the  assumption  that  these  consonants 
are  neutral  with  regard  to  lip  configuration  is  untenable. 

These  data  also  provide  evidence  of  the  complexity  of  the  electrcmyo- 
graphic  and  kinematic  data  collected  for  studying  coarticulation  processes. 
First,  it  is  impossible  to  separate  active  protrusion  gestures  from  passive 
relaxation  of  lips  that  have  been  retracted,  except  by  observing  the  activity 
of  the  muscles  responsible  for  those  protrusion  gestures.  Second,  the  HMG  da¬ 
ta  may  more  closely  reflect  the  underlying  segmental  structure  of  speech  than 
do  kinematic  data.  For  example,  while  we  see  no  trough  in  the  movement  traces 
of  the  /i#tu/  utterance  for  subject  TB,  there  are  clearly  separate  peaks  of 
001  activity  for  both  the  consonant  and  vowel  segments,  suggesting  the  segmen¬ 
tal  nature  of  the  underlying  articulatory  organization. 

In  addition  to  providing  insights  into  the  causes  of  some  of  the  apparent 
discepancies  resulting  from  problems  in  experimental  design,  we  would  also 
suggest  that  another  source  of  conflict  in  attempts  to  develop  a  single  model 
of  anticipatory  phenomena  stems  from  presupposing  that  the  timing  of  the  onset 
of  rounding  is  an  entirely  anticipatory  phenomenon.  It  is  notable  that  in 
both  this  study  and  our  earlier  work  (Bell-Berti  &  Harris,  1981),  the  onset  of 
vowel-related  lip  rounding  is  closer  to  the  acoustic  onset  of  the  rounded 
vowel  for  sequences  of  the  form  /i#tu/  than  for  any  other  sequence.  This  re¬ 
sult  might  seem  to  provide  some  limited  support  for  the  feature  migration  hy¬ 
pothesis,  if  this  sequence  were  compared  with  only  one  longer  sequence  (see, 
e.g.,  Sussman  &  Westbury,  1981).  However,  we  believe  that  an  equally  plausi¬ 
ble  explanation  is  that  the  result  reflects  the  suppression  of  lip  rounding 
until  the  first  vowel  can  be  completed  without  distortion.  That  is,  the  onset 
of  rounding  may  be  constrained  by  the  carryover  effects  of  a  preceding 
(unrounded)  vowel.  Thus,  in  a  sequence  like  /i#tu/,  where  the  vowel- to- vowel 
interval  is  fairly  short,  the  rounding  onset  might  be  delayed  relative  to  oth¬ 
er  sequences  where  the  consonantal  sequence  occupies  a  longer  time  slot.  In 
fact,  Sussman  and  Westbury's  (1981)  observation  of  systematic  differences  in 
the  onset  of  lip  rounding  as  a  function  of  the  identity  of  the  preceding 
unrounded  vowel  may  be  interpreted  as  evidence  of  the  same  carryover  effect. 
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Summary 

These  data  were  part  of  a  study  designed  to  account  for  conflicting  re¬ 
sults  of  previously  reported  studies  by  suggesting  that  at  least  some  of  the 
apparent  discrepancy  arises  from  experimental  design.  Because  our  two  sub¬ 
jects  produced  alveolar  consonants  with  significant  orbicularis  oris  activity 
in  both  rounded  and  unrounded  vowel  environments,  we  were  able  to  establish 
that  those  gestures  that  were  variable  in  their  onsets  on  both  the  EMC  and 
movement  levels  were  clearly  tied  to  something  that  was  acoustically  variable 
as  well — namely,  the  onsets  of  consonant  strings  of  differing  durations.  We 
also  observed  separate  consonant  and  vowel-related  activity,  as  in  the  EMG  re¬ 
cords  of  the  /iC^u/  utterances,  where  there  were  almost  always  distinct 
peaks  for  each.  Furthermore,  our  EMG  data  may  be  interpreted  as  reflecting  a 
stable  onset  of  lip  rounding  independent  of  consonant  string  duration,  except 
for  the  case  of  the  shortest  consonant  string.  And,  while  the  tendency  has 
been  to  view  all  of  these  phenomena  as  reflecting  only  anticipatory  coarticu¬ 
lation,  we  believe  it  more  likely  that  they  represent  the  combined  effect  of 
carryover  and  anticipatory  processes. 
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Footnotes 

‘We  have  limited  ourselves  here  primarily  to  a  consideration  of  anticipa¬ 
tory  phenomena.  This  limitation  was  imposed  because  most  theoretical  discus¬ 
sions  have  focused  on  anticipatory  coarticulation. 

^The  literature  in  this  area  contains  two  different  indices  to  consonant 
string  length:  the  number  of  consonant  segments  (e.g.,  Daniloff  &  Moll,  1968; 
Lubker  &  Gay,  1982)  and  the  duration  of  the  consonant  sequence  (e.g,, 
Bell-Berti  &  Harris,  1979,  1982;  Engstrand,  1981).  Although  these  two  meas¬ 
ures  are  related,  the  relationship  is  not  isomorphic  (see,  for  example.  Table 
1). 


^Subject  TB  is  a  speaker  of  educated  Greater  Metropolitan  New  York  City 
English.  Subject  CH  is  a  speaker  of  educated  Central  Florida  English. 

"This  result  is  compatible  with  results  of  other  studies  using  subjects 
known  to  produce  the  alveolar  consonants  /s/  and  /t/  without  lip  rounding 
(cf.  Bell-Berti  &  Harris,  1982;  Engstrand,  1981),  although  these  studies 
clearly  still  subscribe  to  the  possibility  that  alveolar  consonants  have 
inherently  neutral  lip  specifications. 

*The  observation  of  "troughs"  in  EMG  and  movement  records  is  not  new 
(cf.  Bell-Berti  &  Harris,  1979;  Engstrand,  1983;  Gay,  1977).  The  fact  that  a 
trough  is  absent  in  movement  records  when  the  intervocalic  consonant  is  short 
may  not  reflect  differences  in  gestural  organization,  but,  rather,  biomechani¬ 
cal  constraints  that  could  influence  the  response  characteristics  of  the  lips. 
That  is,  with  movement  being  rather  slow  relative  to  EMG  activity,  it  is  hard¬ 
ly  surprising  that  the  lips  do  not  have  time  to  protrude,  retract,  and  pro¬ 
trude  again  for  the  rounded  vowel  during  the  75  ms  /t/  closure. 

®We  would  note,  however,  that  there  was  no  consistent  pattern  of  DC 
offsets  between  the  /iC^i/  and  /iC^u/  utterances,  suggesting  that  these 
differences  were  independent  of  vowel  rounding. 
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Abstract.  In  two  experiments  subjects  read  aloud  pairs  of  nonsense 
syllables  rapidly  presented  on  a  display  screen  or  repeated  the  same 
syllables  presented  auditorily.  The  error  patterns  in  both  experi¬ 
ments  showed  significant  asymmetry,  thus  lending  support  to  explana¬ 
tions  of  the  error  generation  process  that  consider  certain  phonemes 
to  be  "stronger"  than  others.  Further  error  analyses  revealed 
substantial  effects  of  phoneme  frequency  in  the  language  and  effects 
of  phoneme  similarity,  which  depended  on  the  feature  system  used  to 
index  similarity.  Phoneme  availability  (the  requirement  that  an 
intruding  phoneme  be  part  of  the  currently  presented  stimulus)  was 
also  important  but  not  essential.  We  argue  that  the  experimental 
elicitation  of  errors  provides  critical  tests  of  hypotheses  generat¬ 
ed  by  the  analysis  of  naturally  occurring  speech  errors. 

Recent  interest  in  speech  errors  has  focused  largely  on  the  evidence  such 
errors  provide  about  levels  of  linguistic  analysis  and  psychological  models  of 
the  speech  production  process.  For  example,  Fromkin  (1971),  basing  her  analy¬ 
sis  on  a  corpus  of  naturally  occurring  speech  errors,  found  evidence  in  sup¬ 
port  of  the  independence  of  various  levels  of  linguistic  analysis,  including 
both  phonemes  and  phonetic  features.  On  the  other  hand,  Garrett  0980),  also 
basing  his  analysis  on  spontaneous-error  collections,  examined  speech  error 
distributions  for  the  constraints  they  provide  about  a  model  of  sentence 
production. 
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The  development  of  experimental  techniques  for  the  elicitation  of  speech 
errors  (see,  for  example.  Motley  &  Baars,  1976)  provides  a  new  source  of  data, 
which,  when  used  in  conjunction  with  the  evidence  from  naturally  occurring  er¬ 
rors,  greatly  facilitates  the  modeling  of  speech  error  generation.  As  Fowler 
(1983)  points  out,  the  experimental  elicitation  of  speech  errors  permits  tape 
recording  of  subjects'  responses  so  that  errors  are  less  likely  to  be  misheard 
or  overlooked.  Furthermore,  experimental  elicitation  provides  more  thorough 
tests  of  hypotheses  generated  by  the  analysis  of  spontaneous  error  collec¬ 
tions,  especially  when  portions  of  the  error  pattern  in  the  naturally  occur¬ 
ring  corpus  are  based  on  relatively  few  examples.  On  the  other  hand,  there  is 
always  the  danger  of  introducing  influences  in  the  laboratory  that  do  not  ap¬ 
ply  in  more  natural  settings. 

Shattuck-Hufnagel  and  Klatt  (1979)  analyzed  collections  of  naturally 
occurring  segment  substitution  errors  and  contrasted  two  types  of  error 
generation  explanations.  In  the  case  of  the  first  type  of  explanation,  it  is 
assumed  that  some  segments  are  "strong"  whereas  others  are  "weak."  Strong 
segments  might  be  those  that  occur  more  frequently  in  the  language,  are  ac¬ 
quired  earlier,  are  unmarked  in  phonological  theory,  or  are  easier  to 
articulate.  The  precise  definition  of  segment  strength  is  less  important  than 
the  role  strong  segments  play.  Each  segment  substitution  error  has  an  intend¬ 
ed,  or  target,  segment  source  for  the  Intruding  error.  The  explanation 
predicts  that  strong  segments  appear  more  often  as  intrusions,  whereas  weak 
segments  appear  more  often  as  targets  in  segmental  substitution  errors.  A 
confusion  matrix  of  such  speech  errors  should  thus  be  asymmetrical.  This 
asyrnnetry  would  reflect  the  pattern  of  strength  versus  weakness  of  the  seg¬ 
ments  Involved. 

In  the  case  of  the  second  type  of  explanation,  on  the  other  hand,  the 
tendency  of  one  segment  (y)  to  substitute  for  another  segment  (x)  would  be 
related  to  their  degree  of  similarity,  but  substitutions  of  x  to  y  and  y  to  x 
would  be  equally  frequent.  A  confusion  matrix  of  speech  errors,  if  such  er¬ 
rors  arose  as  predicted  by  this  type  of  explanation,  should  thus  be  symmetri¬ 
cal. 


Shattuck-Hufnag  '1  and  Klatt  (1979)  analyzed  the  confusion  matrix  generat¬ 
ed  by  1620  substitution  errors.  The  matrix  proved  to  be  asymmetrical.  Howev¬ 
er,  further  analysis  revealed  that  the  asyrnnetry  was  due  almost  exclusively  to 
four  consonant  segments  /s,  5,  5,  t/,  such  that  errors  of  the  type  /s/  to  /§/, 
/s/  to  /5/,  and  /t/  to  /5/  were  all  more  frequent,  respectively,  than  /§/  to 
/s/,  /5/  to  /s/,  and  /?/  to  /t/.  Once  this  source  of  asymmetry  was  removed, 
the  confusion  matrix  of  segmental  errors  was  no  longer  significantly  asymmetr¬ 
ical.  However,  the  pattern  of  errors  for  /s,  S,  5,  t/,  which  contributed  most 
to  the  asyrnnetry  of  the  matrix,  could  not  be  accounted  for  by  stronger  seg¬ 
ments  intruding  more  often,  since,  according  to  Shattuck-Hufnagel  and  Klatt, 
/5/  and  /5/,  for  example,  are  less  frequent  and  acquired  later  than  /s/  (i.e., 
they  ar'e  weaker)  ,  yet  they  intruded  more  often. 

Shattuck-Hufnagel  and  Klatt  proposed  to  account  for  the  asymmetrical  pat¬ 
tern  of  their  confusion  matrix  in  terms  of  a  palatalization  mechanism.  They 
checked  their  corpus  for  factors  that  might  "palatalize"  the  pronunciation  of 
a  non-palatal  consonant  (e.g.,  /s/  becoming  /§/),  but  no  difference  was  found 
between  the  source  consonant  environments  in  which  palatalizing  and 
non-palatalizing  errors  occurred.  When  the  vowel  environments  of  the  target 
utterances  were  examined,  Shattuck-Hufnagel  and  Klatt  found  that  a  palataliz- 
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ing  error  occurred  proportionately  more  often  before  a  high  vowel  (e.g.,  /i/), 
but  that  this  difference  was  not  statistically  significant.  However,  their 
calculations  were  based  on  a  relatively  small  number  of  observations.  The  ef¬ 
fect  of  the  following  vowel  might  indeed  be  reliable  given  a  larger  number  of 
observations. 

The  authors  concluded  that  the  evidence  from  their  data  suggests  that  er¬ 
rors  arise  during  the  speech  production  process  when  one  of  two  simultaneously 
available  segments  is  mis-selected  for  a  slot  in  an  utterance,  with  the  two 
segments  generally  being  equally  likely  to  be  mis-selected. 

Notice,  however,  that  an  explanation  assuming  that  phonemes  are  not  equal 
in  strength,  in  particular  one  for  which  a  strong  segment  is  defined  as  a  more 
frequent  segment  in  the  language, ‘  dc^s  not  receive  a  fair  test  in  a  corpus  of 
naturally  collected  errors,  because  the  prior  probabilities  of  occurrence  for 
all  the  segments  are  not  equal.  Imagine  an  explanation  of  the  error  genera¬ 
tion  process  according  to  which  segment  strength  is  defined  by  segment  fre¬ 
quency  and  similar  segments  are  likely  to  substitute  for  one  another.  Such  an 
explanation  would  predict  that  the  rate  that  a  frequent  segment  would  be 
mispronounced  given  that  it  was  intended  would  be  lower  than  the  rate  that  an 
infrequent  segment  would  be  mispronounced  given  that  it  was  intended.  So,  for 
example,  for  /s/  and  /5/,  similar  segments  that  might  easily  be  confused,  with 
/s/  as  the  stronger  because  it  is  more  frequent,  the  rate  of  /s/  being 
mispronounced  given  that  it  was  Intended  should  be  lower  than  the  rate  of  /5/ 
being  mispronounced.  But  the  collection  of  naturally  generated  speech  errors 
reflects  the  frequency  of  occurrence  of  phonemes  in  English,  not  just  the  er¬ 
ror  rates  given  that  the  phonemes  are  intended.  Thus,  since  /s/  is  much  more 
frequent  in  the  language  than  /§/,  it  will  occur  much  more  often  as  an  intend¬ 
ed  phoneme,  so  that  it  will  occur  more  frequently  as  a  target  than  /5/,  even 
if  its  rate  of  occurrence  as  a  target  given  that  it  was  intended  is  lower. 
Furthermore,  /5/,  which  is  likely  to  substitute  for  /s/  because  it  is  very 
similar,  will  appear  more  often  as  an  Intrusion  than  as  a  target,  because  of 
the  high  prior  probability  or  frequency  of  /s/  as  an  intended  phoneme.  Note 
that  the  asyranetry  arises  because  of  the  segmental  similarity  of  /s/  and  /5/ 
and  a  great  discrepancy  in  their  relative  frequencies  of  occurrence  in  En¬ 
glish.  An  experimental  elicitation  of  errors  using  these  segments  in  source 
utterances  provides  a  good  way  of  avoiding  the  problem  of  unequal  frequencies 
of  occurrence,  because  in  the  experimental  situation,  the  intended  utterances 
can  be  assigned  equal  prior  probabilities.  If  frequency  contributes  to  seg¬ 
ment  strength  and  if  strength  is  a  factor  in  the  error  generation  process, 
then  /s/  should  appear  more  often  as  an  intrusion  and  /5/  more  often  as  a  tar¬ 
get,  in  the  controlled  experimental  situation. 

Intuitively,  /s/  and  /5/  seem  quite  similar,  but  similarity  between  two 
segments  has  not  been  clearly  defined  in  the  speech  error  context,  although 
several  investigators  (Fromkin,  1971;  MacKay,  1970;  Nooteboom,  1969)  have  dis¬ 
cussed  the  role  of  features  in  the  error  generation  process.  One  way  of 
defining  segment  similarity  might  be  on  the  basis  of  the  number  of  shared  fea¬ 
tures.  Clearly,  the  choice  of  a  particular  feature  system  can  be  crucial. 
Given  a  particular  feature  system,  segments  might  need  to  share  all  or  almost 
all  features  and  only  differ  on  some  single  individual  feature  (e.g.,  anterior 
or  high)  or  type  of  feature  (e.g.,  features  for  place  of  articulation)  for  er¬ 
rors  to  occur  frequently.  The  role  of  segment  similarity  can  be  assessed  in 
two  ways:  1)  Does  the  similarity  of  two  segments  in  an  utterance  affect  the 
tendency  of  subjects  to  make  errors  on  those  segments  and  2)  Given  that  an  er¬ 
ror  has  occurred,  how  similar  is  the  intruding  phoneme  to  its  intended  target? 
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Another  issue  is  whether  it  is  necessary  for  the  target  and  intruding 
segments  to  be  simultaneously  available  for  a  substitution  error  to  occur  that 
involves  them.  In  a  very  broad  view,  the  availability  of  a  segment  as  an  er¬ 
ror  source  should  be  a  function  of  its  frequency  in  the  language.  A  narrower 
view  might  define  segment  availability  such  that  the  source  of  an  error  need 
occur  within  a  relatively  constrained  portion  of  the  intended  utterance.  One 
could  assess  this  narrower  view  of  availability  experimentally  by  seeing 
whether  substitutions  of  y  for  x  are  more  likely  to  occur  when  y  is  part  of 
the  stimulus. 

Finally,  it  may  be  that  Shattuck-Hufnagel  and  Klatt's  observed  asymmetry 
involving  /s,  5,  5,  t/  does  reflect  a  palatalizing  mechanism  but  there  were 
insufficient  observations  in  the  environment  of  high  vowels  or  palatal  conso¬ 
nants.  Again,  the  experimental  situation  permits  a  direct  test  of  this  hy¬ 
pothesis. 

The  basic  technique  for  the  experimental  elicitation  of  speech  errors  in¬ 
volves  what  Baars  (1980)  calls  the  "competing  plans  framework."  Essentially, 
the  subject  is  given  two  alternative  plans  for  the  production  of  an  utterance 
and  is  required  to  make  a  rapid  response.  For  example,  the  subject  might  see 
the  series  of  word  pairs  "give  book,  go  back,  get  boot,  bad  goof"  flashed 
rapidly  on  a  screen.  Notice  that  the  fourth  word  pair,  the  test  pair,  "bad 
goof"  involves  a  reversal  of  the  initial  consonant  pattern  found  in  the  first 
three  pairs,  the  bias  pairs.  After  the  test  pair,  at  the  sound  of  a  buzzer, 
the  subject  would  be  expected  to  say  the  now-occluded  final  pair  as  quickly  as 
possible.  Under  these  conditions,  a  number  of  subjects  will  produce  a  speech 
error  and  may  even  spoonerize  the  test  pair,  reversing  the  initial  consonants, 
and  say  "gad  boof"  instead. 

We  adapted  this  basic  technique  for  the  purposes  of  our  study.  Since 
previous  work  (Baars,  Motley,  &  MacKay,  1975)  has  shown  that  there  is  output 
monitoring  for  the  lexical  status  of  spoonerized  words  (e.g.,  that  "gad  boof," 
which  contains  two  non-lex ical  items,  will  occur  less  often  as  an  error  for 
"bad  goof"  than  "darn  bore,"  which  contains  two  lexical  items,  will  occur  as 
an  error  for  "barn  door"  in  a  similar  sequence),  we  chose  pairs  of  nonsense  CV 
syllables  as  stimuli.^  In  pilot  work,  we  found  that  subjects  tended  to  make  a 
greater  number  of  errors  when  they  were  asked  to  pronounce  both  the  bias  and 
test  items  than  when  they  pronounced  only  the  test  items.  Hence,  we  required 
subjects  to  pronounce  all  of  the  items  flashed  before  them  on  a  screen.’  Fur¬ 
thermore,  pilot  work  indicated  that  when  the  bias  pairs  had  a  consistent  vowel 
pattern  (e.g.,  compare  the  bias  series  "right  lean,  ripe  leap,  ride  leak"  with 
the  one  given  above),  more  errors  tended  to  occur  than  when  the  vowel  pattern 
was  inconsistent  (see  also  Dell,  198it).  Thus,  we  restricted  our  bias  pairs  to 
those  with  consistent  vowel  patterns.  We  created  our  CV  stimuli  from  the  four 
consonants  in  Shattuck-Hufnagel  and  Klatt's  data  base  that  had  been  responsi¬ 
ble  for  the  initial  asymmetry  /s,  S,  ?,  t/,  plus  the  additional  consonant 
phoneme  /0/ .  The  addition  of  /0/  allowed  us  to  test  whether  similarity,  de¬ 
fined  as  a  single  feature  difference,  depends  on  a  specific  feature,  since  the 
consonants  in  the  pairs  /s,  5/  and  /0,  t/  differ  on  the  single  feature 
continuant,  according  to  Chomsky  and  Halle  (1968),  whereas  the  consonants  in 
the  pair  /s,  0/  differ  on  the  single  feature  strident.  The  consonant  /©/  also 
provides  another  relatively  infrequent,  but  non-palatal  phoneme  to  test 
against  the  Infrequent  palatal  set  /5,  5/.  We  chose  the  vowels  /a,  i,  u/  for 
the  test  set,  so  as  to  be  able  to  assess  whether  vowel  height,  high  /i,  u/ 
versus  low  /a/,  or  vowel  height  and  frontness,  front  high  /i/  versus  /a,  u/, 
migh*"  be  the  possible  source  of  palatalizing  errors. 
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Experiment  1 

In  Experiment  1,  pairs  of  CV  nonsense  stimuli  were  presented  visually, 
and  subjects  were  asked  to  read  all  presented  items  as  rapidly  as  possible. 

Method 


Materials.  Using  the  set  of  consonant  phonemes  /s,  5,  5,  t,  0/,  written 
as  s,  sh,  ch,  and  respectively,  and  the  set  of  vowels  /a,  i,  u/,  we 
constructed  pairs  of  CV  nonsense  syllables.  Since  we  eliminated  pairs  with 
matched  consonants  (e.g.,  as  well  as  those  with  matched  vowels  (e.g. , 
sa  ta) ,  there  were  twenty  possible  consonant  permutations  and  six  possible 
vowel  permutations  for  a  total  of  120  test  stimuli.  A  set  of  120  filler  pairs 
of  CV  nonsense  syllables  was  analogously  constructed  using  another  set  of  con¬ 
sonants  /r,  1,  b,  V,  m/  and  the  same  set  of  vowels  /a,  i,  u/. 

Design .  Each  of  the  120  test  stimuli  was  preceded  by  three  identical 
bias  pairs  of  nonsense  syllables  that  were  constructed  analogously  to  the  test 
CV  pair  set  and  in  which  the  order  of  the  vowels  was  preserved  but  that  of  the 
consonants  was  switched.  For  example,  for  the  test  stimulus  ^  the 
presentation  order  was  su  In  order  to  prevent  sub¬ 
jects  from  anticipating  a  switch  after  three  identical  CV  nonsense  pairs,  30 
of  the  test  CV  nonsense  syllables  were  also  presented  as  distractors  in  groups 
of  four  (e.g.,  ^  £i,  ^  ^  £i ) ,  30  in  groups  of  three,  30  in 
pairs,  and  30  singly.  The  120  filler  CV  nonsense  pairs  also  served  to  divert 
subjects'  attention  from  the  test  stimuli  consonants  and  pattern  of  presenta¬ 
tion.  Thirty  of  the  filler  CV  nonsense  syllables  were  presented  in  groups  of 
four,  30  in  groups  of  three,  30  in  groups  of  two,  and  30  singly.  For  half  the 
trials  with  the  filler  syllables,  the  last  item  preserved  the  consonant  order 
(e.g.,  ra  ra  11,  r£  ii.  £3  trials  the  last  item  re¬ 
versed  the  consonant  order  (e.g.,  ra  r®  IS  jS  presenta¬ 
tion  of  the  test  stimuli,  distractors,  and  filler  sequences  was  in  pseudoran¬ 
dom  order  with  the  constraint  that  there  were  four  test  sequences,  four  filler 
sequences,  and  four  distractor  sequences  in  every  block  of  twelve  sequences. 
There  was  a  total  of  1080  pairs  of  CV  nonsense  syllables  presented  to  sub¬ 
jects. 

Subjects.  Thirteen  men  and  women  participated  in  the  experiment.  Four 
were  volunteers  from  the  Haskins  Laboratories  staff  (who  were  relatively 
knowledgeable  phonetically),  and  nine  were  Yale  University  undergraduates 
receiving  course  credit  for  their  participation.  (Five  additional  subjects 
[one  volunteer  and  four  students]  were  tested,  but  their  data  were  not  ana¬ 
lyzed  because  they  failed  to  read  a  substantial  number  of  the  syllable  pairs, 
and  it  was  often  not  possible  to  determine  what  syllable  pair  they  were 
responding  to  when  they  did  utter  something.) 

Apparatus  and  procedure.  The  pairs  of  CV  nonsense  syllables  were 
projected  under  program  control  onto  the  self-refreshing  screen  of  a  Decgraph- 
ic  11  GT-iiO  computer  terminal  hooked  up  to  a  PDF  ll/ii5  computer  at  the  rate  of 
two  syllable  pairs  a  second.  Subjects  were  asked  to  pronounce  each  syllable 
pair  aloud  as  accurately  as  possible.  During  this  task,  subjects  listened  to 
white  noise  presented  over  Grason-Stadler  TDH  39-3OOZ  headphones  in  order  to 
encourage  them  to  speak  up  as  loudly  as  possible  and  to  minimize  their  ability 
to  monitor  their  own  utterances.  Subjects'  responses  to  the  stimuli  were 
recorded  via  a  Sony  F-27S  microphone  onto  a  Sony  cassette  tape  recorder  model 
TC-110B  for  later  analysis. 
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Subjects  were  told  that  the  nonsense  syllables  they  would  see  would  be 
composed  of  three  vowel  sounds,  spelled  as  a,  and  u.  They  were  instructed 
to  pronounce  the  letter  ^  as  /i/  as  in  the  word  eat ,  the  letter  a  as  /a/  as  in 
the  word  father ,  and  the  letter  u  as  /u/  as  in  the  word  boot .  They  were  also 
told  to  pronounce  the  letter  pair  ^  as  in  the  word  think ,  sh  as  in  shoe,  and 
^  as  in  church.  Subjects  were  then  shown  CV  nonsense  syllable  pairs 
typewritten  on  a  sheet  of  paper  and  asked  to  read  them  aloud.  Their 
pronunciation  was  checked,  and  if  they  did  not  pronounce  the  letters  as 
instructed,  they  were  asked  to  do  so.  There  were  29  CV  nonsense  pairs  from 
the  filler  set  presented  first  to  subjects  as  practice  with  the  computer  appa¬ 
ratus. 

Results 


Subjects'  responses  to  all  1080  CV  stimulus  pairs  were  transcribed  by  one 
listener  and  then  checked  by  another.  Across  the  13  subjects,  there  were  185 
disagreements  (1.3?),  which  were  resolved  by  relistening  to  the  disputed  pairs 


Table  1 

Feature  Differences  Separating  Consonants  in  a  Pair  and  Error  Frequencies  for 
Test  Stimuli  in  Experiment  1  as  a  Function  of  Consonant  Pair  and  Vowel  Pair 

Feature 

Differences  Vowel  Pair 

C&H  B&G  Consonant  Pair  ai  la  au  ua  ui  iu  Total 


^  H 
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until  a  consensus  was  reached.  A  response  was  scored  as  an  error  if  the  pair 
deviated  in  any  way  from  the  stimulus;  thus,  null  responses  were  scored  as  er¬ 
rors.  The  results  for  the  120  test  stimuli  are  summarized  in  Table  1  in  terms 
of  error  frequencies  as  a  function  of  consonant  pair  and  vowel  pair. 


As  is  clear  from  Table  1,  the  vowel  pairs  did  not  have  consistent  effects 
on  error  rates.  An  analysis  of  variance  was  conducted  on  the  error  data 
summed  across  vowel  pairs  in  order  to  determine  the  significant  effects  due  to 
consonant  pairs.  Two  factors  were  included  in  the  analysis,  one  for  the  ten 
different  combinations  of  consonants,  and  the  second  to  assess  the  effect  of 
consonant  frequency  on  error  rates,  such  that  the  first  permutation  of  the 
consonant  pair  had  the  more  frequent  of  the  two  consonants  preceding  the  less 
frequent  consonant  (with  frequency  determined  by  Dewey,  1923),  and  the  second 
permutation  had  the  less  frequent  consonant  preceding  the  more  frequent  one, 
as  revealed  in  the  ordering  of  Table  1.  Both  main  effects  were  significant. 
The  consonant  pairs  were  significantly  different  from  one  another, 
F(9 , 1 08  )=6. 89 ,  £<.0001,  and  consonant  pairs  for  which  the  less  frequent  conso¬ 
nant  preceded  the  more  frequent  consonant  had  a  significantly  greater  number 
of  errors,  F (1 , 1 2) =5.76 ,  £=.0335.  The  Interaction  of  consonant  pairs  and  fre¬ 
quency  was  not  significant,  F(9 , 108  )«1 .50,  £=.1560. 

Feature  analysis  j_.  A  further  analysis  was  performed  on  the  same  data  in 
order  to  test  the  hypothesis  that  the  number  of  feature  differences  between 
each  consonant  in  a  target  pair  was  crucial  in  determining  the  error  rate. 
Since  there  are  a  variety  of  competing  feature  analyses  and  since  the  choice 
of  a  single  feature  system  could  bias  our  results,  we  chose  to  contrast  two 
phonetic  feature  systems:  the  well-known  system  devised  by  Chomsky  and  Halle 
(1968),  henceforth  C  &  H,  and  another  one  derived  from  a  corpus  of  speech  ei — 
rors  in  English  and  German  by  van  den  Broecke  and  Goldstein  (1980),  henceforth 
B  &  G.  First,  the  consonant  pairs  were  divided  into  four  feature  difference 
classes  according  to  C  &  H  (see  Table  1),  and  errors  were  averaged  across  con¬ 
sonant  pairs  in  each  class.  The  main  effect  of  feature  difference  class  was 
not  significant,  F(3,36  )=1  .09  ,  £=.3672.  Furthermore,  the  error  rate  did  not 
monotonically  increase  or  decrease  with  the  number  of  feature  differences,  and 
the  error  rate  for  the  consonant  pair  sh-ch  differed  greatly  from  that  for 
th-t,  though  both  consonant  pairs  differ  on  the  same  single  feature. 

Next,  the  consonants  were  divided  into  three  feature  difference  classes 
according  to  B  &  G  (see  Table  1).  With  this  feature  set,  the  main  effect  of 
feature  difference  class  was  significant,  F(2,24 )«1 4. 22,  £=.0002.  The  mean 
number  of  errors  per  subject  for  consonant  pairs  differing  on  one  feature  was 
2.2,  on  two  features,  1.4,  and  on  three  features,  1.2. 

Substitution  errors.  A  separate  analysis  was  made  of  substitution  er¬ 
rors,  in  which  the  correct  consonant  in  a  syllable  of  a  test  stimulus  was  re¬ 
placed  by  another  consonant  in  the  stimulus  set.  The  resulting  confusion  ma¬ 
trix  is  presented  in  Table  2. 

In  order  to  determine  whether  the  relative  frequency  with  which  each  con¬ 
sonant  segment  intrudes  is  the  same  as  the  frequency  with  which  it  appears  as 
a  target,  we  computed  a  x  statistic  comparing  the  two  distributions  and  found 
that  they  were  in  fact  significantly  different  from  one  another,  x  (4)-69.1, 
p<.01.  One  striking  discrepancy  between  the  previous  study  by  Shat- 
tuck-Hufnagel  and  Klatt  (1980)  and  ours  concerns  the  asymmetrical  pattern  of 
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Table  2 


Substitution  Errors  in  Experiment  1  as  a  Function  of  Target  Consonant  and 
Intrusion  Consonant 


Target 


Intrusion 

T 

S 

SH 

CH 

TH 

TOTAL 

T 

- 

10 

6 

5 

27 

48 

S 

6 

— 

54 

6 

10 

76 

SH 

28 

— 

49 

5 

86 

CH 

6 

6 

26 

— 

18 

56 

TH 

5 

5 

10 

9 

— 

29 

TOTAL 

21 

119 

96 

69 

60 

295 

substitutions  involving 

sh  and  s. 

In  the 

earlier 

study. 

there  were  more 

placements  of 

s  by  sh 

than  vice 

versa. 

whereas 

the  opposite  was  found  in 

present  study.  This  discrepancy  may  be  attributable  in  part  to  visual  fac¬ 
tors.  Perhaps,  consonant  segments  that  contain  the  same  letters  (^/s  and 
^/Jt)  are  particularly  likely  to  be  confused,  especially  in  the  direction  of 
letter  deletion.  An  analysis  that  eliminates  such  confusions,  by  combining 
the  ^  and  s  segments  and  the  ^  and  t  segments,  yields  a  marginally  signif¬ 
icant  difference  between  the  target  and  intrusion  distributions,  x  (2)  =  ^.8, 
£<.10. 


Frequency  analysis.  To  determine  whether  the  incidence  of  errors  for 
each  target  consonant  phoneme  is  related  to  the  log  frequency  of  that  segment 
in  English,  we  computed  a  Pearson  Product-Moment  correlation  coefficient 
relating  the  frequency  with  which  each  of  the  five  consonants  occurred  as  a 
target  to  its  log  frequency  in  English  (Dewey,  1923).  As  expected  according 
to  the  strength  explanation,  there  was  a  negative  correlation,  although  it  did 
not  reach  standard  levels  of  statistical  significance,  £(3)--. 696,  £>.10.  A 
significant  negative  correlation  was  found  when  the  frequency  analysis  of 
Shattuck-Hufnagel  and  Klatt  (1979)  was  used  Instead  of  that  of  Dewey  (1923), 
£(3)”-.887,  £<.05.  This  new  frequency  analysis,  henceforth  the  content  count, 
was  derived  from  the  speech  sample  of  Carterette  and  Jones  (197^)  and  Includes 
only  content  words,  not  function  words  or  common  bound  morphemes."* 


A  similar  analysis  was  conducted  to  compare  intrusion  frequency  and  log 
frequency  in  the  language.  The  correlations  in  this  case  were  not  significant 
for  the  Dewey  (1923)  count,  r(3)“.28M,  £>.10,  nor  for  the  content  count, 
r(3)=-.05^,  £>.10. 


In  view  of  the  high  correlations  for  target  frequency  and  despite  the  low 
correlations  for  intrusion  frequency,  frequency  in  the  language  in  addition  to 
visual  confusions  may  be  a  source  of  the  asymmetry  in  intrusions  noted  earli¬ 
er.  In  order  to  test  this  hypothesis,  for  the  ten  consonant  pairs  (e.g. , 
ch-t) ,  we  compared  how  often  the  more  frequent  phoneme  intruded  for  the  less 
frequent  phoneme  (t  for  ch)  rather  than  vice  versa  (ch  for  £) .  For  one  test 
we  used  the  Dewey  count,  which  yielded  a  significant  difference,  £(9)-2.*l1, 
£<.05,  and  for  a  second  test  we  used  the  more  recent  content  count,  which  was 
not  significant  £(9)<1.  By  both  counts,  the  more  frequent  phoneme  in  the  pair 
Intruded  more  often  on  the  average  than  did  the  leas  frequent  phoneme,  in  ac¬ 
cord  with  a  strength  explanation  of  speech  errors. 
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Feature  analysis  2.  A  second  feature  analysis  was  performed  on  the 
substitution  data  to  see  whether  more  substitutions  of  y  for  x  occur  when  x 
and  y  differ  by  a  single  phonetic  feature  than  when  they  differ  by  more.  For 
the  C  &  H  features,  the  mean  number  of  substitution  errors  involving  a  change 
of  one  feature  was  20,  of  two  features  2*4,  of  three  features  6,  and  of  four 
features  9.  Clearly  though  one-  and  two-feature  changes  are  more  frequent 
than  three-  and  four-feature  changes,  there  is  not  a  monotonic  decrease  in  the 
number  of  substitution  errors  as  the  number  of  feature  changes  increases.  In¬ 
deed,  ^-ch  and  which  differ  on  the  same  single  feature  according  to  C  & 
H,  show  mean  substitution  rates  of  38  and  I6,  respectively.  Furthermore, 
there  are  complementary  asymmetries  in  the  substitution  rates  for  these  two 
pairs  (see  Table  2)  such  that  the  feature  change  to  [+  continuant]  involves 
fewer  errors  for  the  pair  t-^  but  more  errors  for  the  pair  ch-sh. 

For  the  B  &  G  features,  the  mean  number  of  substitution  errors  involving 
a  change  of  one  feature  was  23,  of  two  features,  8,  and  of  three  features,  10. 
Although  there  is  not  a  perfect  monotonic  decrease  in  the  number  of  substitu¬ 
tion  errors  as  the  number  of  feature  changes  increases,  it  is  clear  that  the 
single  feature  substitution  errors  are  most  frequent. 

Availability  analysis.  A  further  analysis  was  performed  on  the  substitu¬ 
tion  errors  to  assess  the  role  of  segment  availability.  We  determined  the 
number  of  times  a  substitution  error  of  y  for  x  occurred  in  the  environment  of 
y  (i.e.,  how  often  did  the  intrusion  phoneme  /t/  occur  for  the  target  phoneme 
/s/  when  the  test  consonant  pair  was  t-s  or  3~t).  By  comparing  that  number  to 
the  overall  number  of  y  for  x  substitutions,  we  determined  the  percentage  of 
times  that  a  substitution  occurred  when  the  error  was  part  of  the  intended 
utterance  (see  Table  3).  For  substitution  errors  of  y  for  x,  y  was  part  of 
the  Intended  utterance  ‘*7.5%  of  the  time.  Since  x  was  paired  with  phonemes 
other  than  y  three  times  as  often  as  it  was  paired  with  y,  the  appropriate 
chance  percentage  is  25%.  Hence,  segment  availability  in  the  stimulus  does 
seem  to  influence  error  rate.  However,  it  clearly  is  not  necessary  for  the 
intruding  phoneme  to  be  part  of  the  intended  utterance,  since  the  majority  of 
the  substitutions  of  y  for  x  occur  when  y  is  not  part  of  the  intended  utter¬ 
ance,  defined  narrowly  here  as  the  test  CV  nonsense  syllable  pair. 

Furthermore,  phoneme  frequency  seems  to  influence  the  Importance  of  avai¬ 
lability.  When  the  direction  of  the  substitution  error  involves  a  change  from 
a  relatively  more  frequent  'strong  or  +)  to  a  relatively  less  frequent  (weak 
or  -)  phoneme  (see  Table  3),  then  it  is  more  important  that  the  infrequent 
segment  he  available,  than  when  the  direction  of  the  substitution  involves  a 
rhafige  from  a  relatively  weak  to  a  relatively  strong  phoneme.  Thus,  by  the 
Oewey  .ount  of  phoneme  frequency,  when  a  change  involves  strong  (+)  to  weak 
(-),  the  weak  segment  is  available  58. 1j  of  the  time,  whereas  when  the  change 
irivoives  weak  (-)  to  strong  (+),  the  strong  segment  is  available  only  *41  .6J  of 
the  time,  t(9)=3.19,  £<.05.  The  same  pattern  obtains  with  the  content  count 
(53. 8X  from  strong  (♦)  to  weak  (-),  *42.3$  from  weak  (-)  to  strong  (+)),  al¬ 
though  the  latter  set  of  differences  is  not  significant,  _t(9)<1. 

On  the  other  hand,  the  availability  of  the  intruding  phoneme  did  not  vary 
regularly  with  the  number  of  feature  differences  separating  each  consonant 
pair.  By  the  C  &  H  feature  set,  the  intruding  phoneme  was  available  *42. 6^  of 
the  time  when  there  was  a  single  feature  difference  between  the  consonants  in 
a  pair,  *41. 8t  of  the  time  when  there  were  two  feature  differences,  55.3?  of 
the  time  when  there  were  three  feature  differences,  and  70.3?  of  the  time  when 
there  were  four  feature  differences.  Although  this  pattern  suggests  the 
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Table  3 


Relative  Frequency  of 

Target 

Phoneme 

(x) 

and  Intruding 

Phoneme  (y) 

and 

Percentage 

of  Errors 

of  the  Type 

X  Changes  to  y  When  y  Was 

i  Available  in  the 

Stimulus  In 

Experiment 

1 

Target 

Intruding 

Relative 

Freq . 

Number  of  x  to 

y  Errors 

Phoneme 

Phoneme 

Dewey 

Content  y  available 

Total 

$ 

X 

y 

X 

y 

X 
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sh 

ch 

+ 

- 

- 

+ 

13 

26 

50.0 

ch 

sh 

- 

+ 

- 

24 

49 

49  .0 

t 

th 

+ 

- 

+ 

- 

2 

5 

40.0 

th 

t 

- 

+ 

- 

5 

27 
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s 

th 

> 

- 

+ 

- 

3 

5 

60.0 

th 

s 

- 

+ 

- 

+ 

5 

10 

50.0 

s 

sh 

+ 

- 

+ 

- 

15 

28 

53.6 

sh 

s 

- 

+ 

- 

+ 

16 

54 

29.6 

t 

s 

> 

- 

+ 

- 

4 

6 

66.7 

s 

t 

- 

+ 

- 

+ 

6 

10 

60.0 

sh 

th 

- 

- 

8 

10 

80.0 

th 

sh 

- 

+ 

4> 

- 

3 

S 

60.0 

t 

ch 
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- 
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2 

6 

33.3 
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+ 
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3 

6 

50.0 
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3 

6 
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4 

4 

100.0 

sh 
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- 
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+ 

4 

6 

66.7 

ch 

th 

- 

- 

+ 

7 

9 

77.8 

th 

ch 

+ 

— 

1 1 

18 

61.1 

TOTAL 

140 

295 

47.5 

possibility 

that  It  Is 

more 

Important 

that 

the  Intruding  phoneme  be  available 

when  consonant  pairs  differ  by  three 

1  or 

more  features.  It  Is 

not  confirmed  in 

the  pattern 

of  availability  for  the 

B  & 

G  features.  In  that 

case,  the  intrud- 

Ing  phoneme 

was  available  46. 

5$  of  the 

time 

when  the  consonants  In  a  pair 

dif- 

fered  on 

a  single  feature, 

57.6$ 

of 

the 

time  when  they  differed  on  two 

fea- 

tures,  but 

only  35.7$  of  the 

time  when 

the 

consonants  differed  on  three 

fea- 

tur  es . 
Discussion 


The  results  of  Experiment  1  show  that  the  likelihood  of  an  error  occur¬ 
ring  for  a  given  segment  In  a  test  pair  depends  In  part  on  the  relative  fre¬ 
quency  In  English  of  the  Individual  segments  In  the  pair.  Thus,  the  matrix 
generated  by  the  substitution  errors  showed  significant  asymmetry.  There  was 
a  high  negative  correlation  between  the  frequency  of  an  error  occurring  for  a 
target  segment  and  Its  log  frequency  of  occurrence  In  English  as  well  as  evi¬ 
dence  that  a  more  frequent  segment  Is  more  likely  to  Intrude  for  a  less  fre¬ 
quent  segment  than  vice  versa. 
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Segment  similarity  clearly  influences  the  generation  of  speech  errors, 
although  the  pattern  of  errors  and  substitutions  is  more  interpretable  when 
segment  similarity  is  based  on  the  B  &  G  rather  than  the  C  &  H  feature  set. 

Finally,  availability  of  the  source  segment  along  with  the  target  segment 
within  the  intended  utterance,  although  important,  does  not  seem  to  be  a  nec¬ 
essary  factor,  but  its  role  increases  when  the  intended  segment  is  higher  in 
frequency  than  the  one  that  replaces  it. 

In  order  to  assure  that  the  results  of  Experiment  1,  in  which  the  stimuli 
were  visually  presented,  were  not  an  artifact  of  the  visual  modality,  we 
redesigned  our  materials  for  auditory  presentation  in  Experiment  2. 

Experiment  2 

Tongue  twisters  (e.g.,  "she  sells  sea  shells")  often  result  frctn 
conflicting  vowel  and  consonant  patterns.  For  example,  there  is  an  ABBA 
( /§/-/s/-/3/-/§/)  consonant  pattern  and  an  ABAB  (/i/-/e/-/i/-/e/)  vowel  pat¬ 
tern  in  the  well-known  tongue  twister  cited  above.  Our  CV  nonsense  test  syll¬ 
ables  were  presented  auditorily  to  subjects  in  this  tongue  twister  format, 
four  syllables  at  a  time,  such  that  the  consonant  pattern  of  presentation  was 
ABBA  and  the  vowel  pattern  ABAB,  and  subjects  were  asked  to  repeat  the  se¬ 
quence  of  four  syllables  as  quickly  and  as  accurately  as  possible. 

Method 


Stimuli .  The  test  consonant  phonemes  /s,  5,  ?,  t,  0/  and  vowels  /a,  i, 
u/  of  Experiment  1  were  used  in  Experiment  2.  Each  of  the  possible  CV  non¬ 
sense  pairs  (eliminating  all  identical  consonant  and  identical  vowel  possibil¬ 
ities)  was  joined  with  a  CV  nonsense  pair  in  which  the  order  of  the  c  nsonants 
changed  but  the  vowels  remained  the  same  (e.g.,  sa  ^  ^  ^).  There  was  a  to¬ 
tal  of  120  such  four-syllable  stimuli.  Each  of  the  original  15  syllables  (5 
consonants  x  3  vowels)  was  recorded  by  one  of  the  investigators  (AGL),  digi¬ 
tized  at  20  kH  and  stored  on  tape.  All  of  the  four-syllable  nonsense  CV  sti¬ 
muli  were  thus  produced  from  the  same  original  15  syllables.  There  were  300 
ms  between  syllables  in  a  four-syllable  string  and  a  5  s  ISI  between  stimuli. 
There  were  no  distractor  or  filler  sequences. 

Design .  The  stimuli  were  presented  in  pseudorandom  order  in  six  blocks 
of  20  each  with  the  following  constraints:  No  consonant  occurred  on  two 
successive  trials,  each  of  the  20  consonant  pairs  occurred  once  in  each  block, 
and  each  vowel  pair  occurred  once  with  each  consonant  pair  in  the  test  and  ei¬ 
ther  3  (4  pairs)  or  4  (2  pairs)  times  per  20-trial  block. 

Subjects .  Eighteen  men  and  women  from  the  University  of  Colorado 
participated  in  the  experiment  and  received  course  credit  in  an  introductory 
psychology  class. 

Apparatus  and  procedure.  The  stimuli  were  transmitted  to  the  subject 
binaurally  through  a  pair  of  Telephonies  earphones  (Model  TDH-39).  The  stimu¬ 
lus  tape  was  played  with  a  TEAC  A-3300S  tape  recorder  at  a  comfortable  listen¬ 
ing  level.  The  subjects  spoke  into  a  Superscope  Model  EC-1  condenser  micro¬ 
phone  that  was  attached  to  an  optisonlcs  Sound-O-Matic  II  cassette  tape 
recorder. 
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The  subjects  were  told  that  they  would  hear  a  series  of  four-syllable  se¬ 
quences.  They  were  instructed  that  the  four  syllables  in  each  sequence  would 
all  be  composed  of  a  consonant  sound  followed  by  a  vowel  sound  and  that  the 
consonants  would  always  be  presented  in  an  ABBA  pattern,  and  the  vowels  would 
always  be  in  an  ABAB  pattern.  They  were  given  as  an  example  the  four-syllable 
sequence  ta-si-sa-ti,  which  has  a  ^~S“3-t  (or  ABBA)  consonant  pattern  and  an 
a-_i-a-^  (or  ABAB)  vowel  pattern.  The  subjects  were  further  told  that  there 
were  only  five  different  initial  consonants  (s  as  in  sigh;  t  as  in  tie ;  th  as 
in  thigh;  sh  as  in  shy ;  and  ^  as  in  child)  and  only  three  different  vowels 
(/a/  as  in  cot;  /i/  as  in  eat ;  and  /u/  as  in  boot). 

The  subjects'  task  was  to  repeat  aloud  into  the  microphone  each 
four-syllable  sequence  they  heard  as  quickly  as  possible  without  making  er¬ 
rors.  They  were  told  to  try  to  say  all  four  syllables  and  guess  if  necessary. 
They  were  instructed  not  to  worry  if  they  made  a  mistake  or  had  trouble  re¬ 
peating  a  sequence  but  to  listen  carefully  for  the  sequence  following  the  one 
they  missed  and  to  try  and  keep  up  with  the  tape.  The  subjects  were  then  giv¬ 
en  three  practice  trials  spoken  by  the  experimenter  (sa-tl-ta-sl ; 
chu-tha-thu-cha ;  shl-su-sl-shu ) . 

Results 

Subjects'  responses  to  all  120  test  stimulus  quadruples  were  transcribed 
by  one  listener  and  then  checked  by  another.  Across  the  18  subjects,  there 
were  340  discrepancies  (3.9%),  which  were  resolved  by  a  third  listener.  How¬ 
ever,  since  a  great  number  of  these  disagreements  involved  confusions  of  /©/ 
and  /f/  and  since  /f/  was  not  a  possible  stimulus,  all  responses  of  /f/  were 
replaced  by  /0/  (there  were  718  /f/  responses  [8.3%J  that  were  replaced  in 
this  way).  Each  syllable  was  scored  separately  and  was  determined  to  be  an 
error  if  it  deviated  in  any  way  from  the  stimulus.  The  results  for  the  120 
test  stimuli  are  summarized  in  Table  4  in  terms  of  error  frequencies  as  a 
function  of  consonant  pair  (ABBA)  and  vowel  pair  (ABAB). 

As  in  Experiment  1,  the  vowel  pairs  did  not  have  consistent  effects  on 
error  rates.  An  analysis  of  varianc-e  was  conducted  on  the  error  data  summed 
across  vowel  pairs  to  assess  the  effects  due  to  consonant  pairs.  The  conso¬ 
nant  pairs  differed  significantly  from  one  another,  F (9 , 153)«1 4 . 17 ,  £<.0001, 
and  the  quadruples  for  which  the  less  frequent  sound  was  heard  first  had  sig¬ 
nificantly  more  errors,  F(1 , 17  )=15  .92,  £-.0009. 

Feature  analysis  1_,  As  for  Experiment  1,  the  consonant  pairs  were  first 
divided  into  four  feature-difference  classes  by  the  C  &  H  feature  system  (see 
Table  4),  and  errors  were  averaged  across  consonant  pairs  in  each  class.  The 
main  effect  of  feature-difference  class  was  marginally  significant, 
F (3.51) “2. 59,  £-.0632,  but  the  error  rate  again  did  not  monotonically  increase 
or  decrease  with  the  number  of  feature  differences. 

Next,  the  consonant  pairs  were  divided  into  three  feature  difference 
classes  by  the  B  &  G  feature  system  (see  Table  4).  The  main  effect  of  feature 
difference  class  was  significant,  F(2,34)-l6.24,  £<.0001.  The  mean  number  of 
errors  per  subject  for  consonant  pairs  differing  on  one  feature  was  11.2,  on 
two  features,  9.2,  and  on  three  features,  8.1. 


Levitt  8t  Healy:  The  Roles  of  Phoneme  Frequency,  Similarity,  and  Availability 


Table  5 

Substitution  Errors  in  Experiment  2  as  a  Function  of  Target  Consonant  and 
Intrusion  Consonant 
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63 

171 

— 

1 12 

69 

415 

CH 

166 

162 

312 

— 

126 

766 

TH 

131 

151 

144 

134 

— 

560 

TOTAL 

455 

616 

880 

648 

533 
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Substitution  errors.  As  in  Experiment  1,  a  separate  analysis  was  made  of 
the  substitution  errors  in  which  the  correct  consonant  sound  was  replaced  by 
another  consonant  in  rhe  stimulus  set  (see  Table  5).  To  evaluate  the  extent 
to  which  the  relative  frequency  that  each  consonant  segment  intruded  corre¬ 
sponds  to  the  frequency  that  it  appeared  as  a  target,  we  computed  a  x  statis¬ 
tic  comparing  the  two  distributions  and  found  that  they  were  in  fact  signif¬ 
icantly  different  from  each  other,  y  {'<)=391.8,  2<.01,  as  in  Experiment  1, 
Also  in  agreement  with  Experiment  1 ,  but  unlike  the  study  by  Shattuck-Hufnagel 
and  Klatt  (1980),  we  found  more  replacements  of  sh  by  s  than  vice  versa. 

Frequency  analysis.  As  for  Experiment  1 ,  we  computed  two  sets  of  corre¬ 
lation  coefficients  to  determine  the  relation  between  the  log  frequency  in  the 
language  of  a  given  consonant  segment  and  its  frequency  of  occurrence  as  a 
target  or  intrusion.  For  targets,  the  correlations  were  negative,  as  expect¬ 
ed,  but  nonsignificant  for  both  the  Dewey,  r(3)=-.352,  2>.10,  and  the  content 
count,  r(3)=~.658,  £>.10.  For  intrusions,  the  correlations  were  positive  but 
not  significant,  for  Dewey,  r(3)=.331,  £>.10,  and  for  the  content  count, 
r(3)=.505,  £>.10.  To  evaluate  whether  frequency  in  the  language  may  account 
for  the  asymmetry  in  intrusion  errors,  we  compared  how  often  the  more  frequent 
phoneme  in  a  pair  intruded  for  the  less  frequent  phoneme  rather  than  vice  ver¬ 
sa.  The  more  frequent  phoneme  intruded  more  often  on  the  average  for  both 
counts.  This  difference  was  significant  by  the  content  count,  £(9)=3.20, 
£<•05,  but  not  by  the  Dewey  count,  _t(9)<1. 

Feature  analysis  2.  For  the  C  &  H  features,  the  mean  number  of  substitu¬ 
tion  errors  involving  a  change  of  one  feature  was  17^,  of  two  features  172,  of 
three  features  157,  and  of  four  features  ll^l.  Although  substitution  errors 
monotonlcally  decreased  as  feature  differences  increased,  again,  sh-ch  and 
th-t,  which  differ  on  the  same  single  feature  according  to  C  &  H,  show  mean 
substitution  rates  of  212  and  1i)0,  respectively. 

For  the  B  &  G  features,  the  mean  number  of  substitution  errors  involving 
a  change  of  one  feature  was  180,  of  two  features,  152,  and  of  three  features, 
120.  Again,  there  is  a  monotonic  decrease  as  the  number  of  feature  differ¬ 
ences  increases. 

Availability  analysis.  For  substitutions  of  y  for  x,  y  was  part  of  the 
intended  utterance  41. 6$  of  the  time  (see  Table  6),  a  percentage  which  is 
substantially  higher  than  that  expected  on  the  basis  of  chance  alone  (25$). 

Phoneme  frequency  again  appears  to  have  an  effect  on  the  importahce  of 
availability.  When  the  direction  of  substitution  goes  from  a  strong  (+)  to  a 
weak  (-)  segment,  the  weak  segment  is  available  47.9$  of  the  time  by  the  con¬ 
tent  count,  and  when  the  direction  of  substitution  goes  from  a  weak  segment 
(-)  to  a  strong  segment  (+),  the  strong  segment  is  available  37.3$  of  the  time 
by  the  content  count,  t(9)=2.93,  p<.05.  The  same  pattern  obtains  with  the 
Dewey  count  (42.1$  from  strong  (+)  to  weak  (-),  41.0$  from  weak  (-)  to  strong 
(+)),  although  the  latter  set  of  differences  was  not  significant,  t(9)<1. 

We  found  only  a  slight  trend  indicating  that  the  intruding  phoneme  is 
less  available  when  consonant  pairs  differ  by  a  single  feature  than  when  they 
differ  by  more  features  for  either  feature  set.  For  £  &  H,  the  intrudihg 
phoneme  was  available  37.9$  of  the  time  when  there  was  a  single  feature 
difference  between  the  consonants  in  a  pair,  45.7$  of  the  time  when  there  were 
two  feature  differences,  42.2$  of  the  time  when  there  were  three  feature 
differences,  and  42.4$  of  the  time  when  there  were  four  feature  differences. 
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For  B  &  G,  the  intruding  phoneme  was  available  U0.o%  of  the  time  when  there 
was  a  single  feature  difference  between  the  consonants  in  a  pair,  *42.3?  of  the 
time  when  there  were  two  feature  differences  and  42.0$  of  the  time  when  there 
were  three  feature  differences. 

General  Discussion 

Compari son  of  Experiments  J_  and  2.  In  Experiment  1  we  considered  the 
possibility  that  visual  confusions  contributed  to  the  error  pattern  in  that 
experiment.  Experiment  2  provides  an  important  control,  since  the  stimuli  in 
Experiment  2  were  presented  auditorily.  Once  we  had  corrected  in  Experiment  2 
for  the  common  auditory  confusion  of  /f/  and  /©/,  we  found  that  the  results  of 
the  two  experiments  were  very  similar.  In  fact,  the  Pearson  Product -Moment 
correlation  coefficient  comparing  the  target  phoneme  frequencies  in  Experi¬ 
ments  1  and  2  showed  a  significant  correlation,  r(3)=.9l5,  £<.05.  When  the 
Intrusion  phoneme  frequencies  of  the  two  experiments  were  compared,  we  found  a 
nonsignificant  negative  correlation,  r(3)=-.250,  £>.10.  Although  the  exact 
patterns  of  intrusions  for  the  two  experiments  did  not  correspond,  the  t  tests 
reported  earlier  did  show  an  effect  of  phoneme  frequency  on  intrusions  for 
both  experiments.  The  error  frequencies  for  the  twenty  consonant  pairs  them- 
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selves  were  also  highly  correlated  in  the  two  experiments.  Recall  that  the 
test  stimuli  in  Experiment  1  were  CV  nonsense  pairs  (e.g.,  M)  and  in 
Experiment  2  they  were  CV  nonsense  quadruples  (e.g.,  ^  ^  sa  ^).  Thus  error 
frequencies  for  the  stimuli  in  Experiment  1  were  compared  separately  for  the 
first  two  and  the  second  two  syllables  in  Experiment  2.  When  the  number  of 
errors  for  each  CV  pair  in  Experiment  1  was  compared  with  the  number  of  errors 
for  the  first  two  syllables  of  the  CV  nonsense  quadruple  in  Experiment  2,  the 
resulting  correlation  was  statistically  significant,  r(l8)=.772,  £<.01.  When 
the  error  frequencies  of  Experiment  1  were  compared  with  those  of  the  second 
two  syllables  of  the  CV  nonsense  quadruples  of  Experiment  2,  the  correlation 
was  again  statistically  significant,  r(l8)=.539,  £<.05.  Finally,  the  error 
frequencies  for  the  first  two  and  the  second  two  syllables  of  the  CV  nonsense 
quadruples  in  Experiment  2  were  compared  and  again  the  correlation  was  signif¬ 
icant,  r(l8)=.772,  £<.01.  Though  visual  confusions  may  have  had  a  small  ef¬ 
fect  on  the  error  pattern  of  Experiment  1  and  auditory  confusions  (most  clear¬ 
ly  those  involving  /f/  and  /0/)  did  occur  in  Experiment  2,  the  patterns  of  er¬ 
rors  in  the  two  experiments  are  clearly  very  similar.  These  patterns  point  to 
the  importance  of  phoneme  frequency  in  the  error  generation  process. 

The  role  of  phoneme  frequency .  We  can  see  two  ways  in  which  phoneme  fre¬ 
quency  had  an  effect  on  our  results.  In  the  first  place,  when  we  examined  our 
data  for  errors  of  any  type,  we  found  in  both  experiments  that  consonant  pair 
stimuli  in  which  the  first  consonant  was  less  frequent  than  the  second  (e.g,, 
ch-^)  tended  to  produce  more  errors  than  consonant  pair  stimuli  in  which  the 
first  consonant  was  more  frequent  (e.g.,  t-ch). 

In  the  second  place,  when  we  examined  substitution  errors  restricted  to 
the  test  consonant  set,  we  found  that  phoneme  frequency  in  English  showed  a 
negative  correlation  with  target  frequencies.  We  also  found,  when  we  looked 
at  the  ten  consonant  combinations,  that  the  more  frequent  phoneme  of  the  pair 
was  more  likely  to  Intrude  as  an  error  for  the  other  member  than  vice  versa. 
These  findings  lend  support  to  an  explanation  of  the  error  generation  process 
in  which  phoneme  strength  is  determined  by  phoneme  frequency.  Thus  we  find  a 
negative  correlation  between  target  phoneme  frequency  and  frequency  of 
occurrence  in  English  because  more  frequent  or  stronger  phonemes  are  less 
likely  to  function  as  targets  or  mispronounced  segments.  On  the  other  hand, 
more  frequent  or  stronger  phonemes  are  somewhat  more  likely  to  function  as 
intrusi ons. 

These  effects  of  frequency  emerge  in  the  experimental  elicitation  of  er¬ 
rors  because  we  were  able  to  control  the  prior  probabilities  of  occurrence  of 
the  individual  phonemes.  With  equal  prior  probabilities,  we  find  an  asymmetr¬ 
ical  pattern  of  substitution  errors.  However,  the  asymmetrical  pattern  that 
emerges  from  our  data  is  different  from  the  one  found  initially  by  Shat- 
tuck-Hufnagel  and  Klatt:  We  find  no  evidence  for  a  palatalizing  mechanism, 
since  we  find  more  non-palatalizing  (e.g.,  ^  to  s)  than  palatalizing  (e.g.,  s 
to  sh)  substitution  errors  in  both  experiments. 

There  is  always  the  danger  in  an  experimental  situation  that  some  factor 
that  does  not  operate  in  the  spontaneous  error  generation  process  was  intro¬ 
duced.  We  used  nonsense  syllables  as  stimuli  rather  than  English  words,  in 
order  to  eliminate  effects  of  lexical  frequency  and  lexical  bias  in  the  error 
generation  process,  but  nonsense  syllables  may  behave  differently  than  English 
words.  For  example,  in  an  experiment  designed  to  elicit  speech  errors,  in 
which  she  had  subjects  read  or  recall  tongue  twisters,  Shattuck-Hufnagel 
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(1982)  found  a  differential  pattern  of  errors  in  the  recall  condition  when  she 
compared  CVC  nonsense  syllables,  CVC  English  words,  CVC  nonsense  syllables 
embedded  in  short  phrases,  and  CVC  English  words  embedded  in  short  phrases. 
Only  the  CVC  English  words  embedded  in  short  phrases  showed  a  higher 
percentage  of  word-initial  errors  (as  is  found  in  naturally  occurring  speech 
errors),  whereas  all  the  other  eliciting  sets  showed  a  higher  percentage  of 
word-final  errors.  However,  this  result  was  obtained  largely  through  a  reduc¬ 
tion  in  the  number  of  word-final  errors  in  the  CVC  English  words  embedded  in 
phrases  as  compared  to  the  other  conditions.  Furthermore,  since  we  used  only 
CV  syxj.ables  in  our  study,  and  we  found  very  few  vowel  errors,  our  errors  were 
almost  entirely  in  word-initial  position.  Thus,  "e  believe  that  the  differ¬ 
ences  between  our  findings  and  those  of  Shattuck-Hufnagel  and  Klatt  (1980)  are 
due  largely  to  the  differences  in  prior  probabilities  of  the  phoneme  targets 
and  are  not  due  to  factors  introduced  by  our  experimental  method  or  use  of 
nonsense  syllables. 

In  our  view,  phoneme  strength  is  a  function  of  phoneme  frequency  rather 
than  ease  of  articulation,  age  of  acquisition,  or  status  in  phonological  theo¬ 
ry  of  the  phoneme  in  question.  With  respect  to  articulation,  in  comparing  /s/ 
and  /5/,  Borden  and  Karris  (1980)  point  out  that  "a  wide  range  of  openings  be¬ 
yond  those  for  /s/  result  in  /§/  type  sounds"  (p.  121).  So  we  see  that  arti¬ 
culation  of  the  phoneme  /s/  is  more  precise  and  therefore  presumably  more  dif¬ 
ficult  (see  also  Anderson,  19*42).  In  contrast,  there  are  claims  in  the 
literature  (e.g.,  Lester  &  Skousen,  197*4)  that  /s/  is  acquired  earlier  than 
/2/.  Closer  examination  of  the  data  reveals  that  children  often  produce  an 
/3/-like  fricative  phoneme  or  stop  where  the  adult  model  has  an  /s/  (Ferguson, 
1978;  Moskowitz,  1973)  before  they  produce  a  phoneme  for  words  in  which  the 
adult  model  has  an  /§/,  probably  because  of  the  higher  frequency  of  /s/.  How¬ 
ever,  that  correct  articulation  of  /s/  is  often  acquired  rather  late  is  clear 
from  reports  of  speech  therapists  (Anderson,  19*42;  Berry  &  Bisenson,  19*47)  and 
others  (Ingram,  Christensen,  Veach,  &  Webster,  1980;  Sander,  1972;  Velleman, 
1983)  who  attest  to  its  difficulty.  Finally,  although  in  phonological  marked¬ 
ness  theory,  as  outlined  by  Chomsky  and  Halle  (1968),  /s/  is  less  marked  than 
/§/,  in  a  more  general  test  of  phonological  markedness  in  the  elicitation  of 
speech  errors.  Motley  and  Baars  (1975)  did  not  find  markedness  to  be  a  signif¬ 
icant  factor. 

Frequency  in  the  language  is  then  for  us  the  best  i/idex  of  a  phoneme's 
strength.  We  believe  that  frequent  phonemes  are  "stronger"  than  infrequent 
ones  because  they  are  the  more  common  of  highly  overlearned  motor  patterns. 
In  this  view,  we  see  single  segment  errors  involving  similar  segments  as  exam¬ 
ples  of  Norman's  (1981)  capture  errors:  "when  a  sequence  being  performed  is 
similar  to  another  more  frequent  or  better  learned  sequence,  the  latter  may 
capture  control"  (p.  6).  The  initial  gestures  relevant  to  the  pronunciation 
of  /s/  and  /§/  are  no  doubt  very  similar,  if  not  identical.  It  is  easy  to  see 
how  the  gestures  to  produce  an  /§/  could  be  "captured"  by  the  more  frequent 
/s/  gestures. 

Segment  similarity.  Do  speech  error  rates  or  patterns  of  substitutions 
depend  on  minimal  feature  differences  between  consonant  pairs?  The  answer  to 
such  a  question  depends  on  the  feature  system  one  chooses.  Ideally,  one  would 
like  to  find  that  a  system  motivated  on  Independent  grounds,  such  as  the  one 
devised  by  Chomsky  and  Halle  (1968),  also  captures  in  a  principled  way  the 
structural  relationships  in  speech  errors.  Indeed,  van  den  Broecke  and  Gold¬ 
stein  (1980)  compared  a  number  of  feature  systems,  along  with  the  one  they  de- 
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vised  on  the  basis  of  English  and  German  speech  errors,  and  found  that 
"feature  systems  designed  without  incorporating  evidence  from  speech  errors 
are  all  capable  of  showing  meaningful  structure  in  phonological  speech  errors 
as  they  occur"  (p.  63).  Nohetheless,  segment  similarity  emerges  as  a  signif¬ 
icant  effect  in  our  data  only  when  we  use  the  B  &  G  features  to  determine  seg¬ 
ment  similarity.  That  the  segment  similarity  effects  in  our  data  are  best 
demonstrated  by  the  B  &  G  features,  derived  from  the  analysis  of  naturally 
occurring  speech  errors  in  English  and  German,  suggests  that  the  errors  we 
find  in  our  experimental  situation  are  analogous  to  those  occurring  in  collec¬ 
tions  of  naturally  occurring  utterances. 


m. 
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Availability.  When  naturally  occurring  speech  errors  are  analyzed,  the 
assumption  is  often  made  that  errors  are  most  likely  to  occur  when  similar 
segments  are  simultaneously  available.  Yet  the  results  of  our  experiments 
suggest  that  availability,  here  defined  in  narrow  terms  as  a  substitution  of  x 
for  y  when  x  is  part  of  the  stimulus,  is  important  but  not  necessary,  since 
the  percentage  of  the  x  for  y  substitutions  in  both  experiments  that  occur 
when  X  is  part  of  the  stimulus  is  substantially  greater  than  the  chance  value 
but  no  greater  than  50$.  Indeed,  the  substitution  errors  in  the  corpus  exam¬ 
ined  by  Shattuck-Hufnagel  and  Klatt  (1979)  Include  30$  with  no  known  source 
word.  It  is  possible  that  the  actual  proportion  of  naturally  occurring  speech 
errors  that  have  no  source  in  the  surrounding  context  might  be  higher  than 
that  estimated  by  Shattuck-Hufnagel  and  Klatt,  and  it  might  be  wrong  to  assume 
in  such  cases  that  the  intruding  error  was  part  of  the  ihtended  utterance  (see 
Harley,  1984,  for  a  discussion  of  higher  level  non-plan-internal  errors).  Fi¬ 
nally,  we  find  that  segment  availability  becomes  increasingly  important  as  the 
frequency  of  the  intruded  phoneme  decreases  and  perhaps,  to  a  lesser  extent, 
as  the  featural  similarity  between  the  intruded  and  target  phonemes  decreases. 


However,  it  is  difficult  to  compare  the  relative  magnitudes  of  the  ef¬ 
fects  of  phoneme  frequency  and  availability  (see  Sechrest  &  Yeaton,  1982). 
Moreover,  the  influence  of  phoneme  frequency  on  the  importance  of  availability 
suggests  that  both  effects  may  sten  from  the  same  activation  mechanism.  The 
frequency  effect  may  be  reflecting  differences  in  the  base  activation  levels 
of  phonemes,  whereas  the  availability  effect  may  reflect  transient  increases 
in  phoneme  activation  that  result  from  being  part  of  the  intended  utterance.® 


^  •  i  • , 


Cone lusions.  The  results  of  our  two  experiments  provide  support  for  an 
explanation  of  the  speech  error  generation  process  in  which  a  segment's 
strength  is  a  function  of  its  frequency  of  occurrence  in  English:  Weak  (or 
infrequent)  segments  tend  to  serve  as  targets  whereas  strong  (or  frequent) 
segments  tend  to  serve  as  intrusions.  The  role  of  phoneme  frequency  is  a  con¬ 
sistently  important  one.  Phoneme  availability  also  plays  a  role,  though  per¬ 
haps  more  restricted  than  expected.  Furthermore,  availability  may  be  reflect¬ 
ing  the  same  activation  mechanism  responsible  for  the  frequency  effect.  Fi¬ 
nally,  the  notion  that  the  segments  that  interact  in  speech  errors  are  likely 
to  be  similar  is  best  supported  by  our  data  when  segment  similarity  is  defined 
in  terms  of  a  feature  set  derived  from  naturally  occurring  speech  errors. 
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Footnotes 

‘Motley  and  Baars  (1975)  found  experimental  evidence  that  consonant  fre¬ 
quency  in  initial  position  affects  the  tendency  of  initial  consonants  in  pairs 
of  CVC  nonsense  words  to  interchange.  Hence,  frequency  in  the  language  seems 
like  an  appropriate  initial  index  of  phoneme  strength. 

^Although  none  of  the  CV  nonsense  pairs  represented  cctnmon  lexical  items 
as  visually  presented,  six  of  them  did  represent  coninon  lexical  items  as  pro¬ 
nounced:  ^  =  'see';  shl  =  'she';  ^  =  'tea';  ^  =  'sue';  shu  =  'shoe';  ^  = 
'two. ' 

’It  is  possible  that  this  rapid  reading  procedure  is  influenced  by 
articulatory  interference  of  the  type  involved  in  tongue  twisters  as  well  as 
by  the  factors  producing  higher-level  slips  of  the  tongue.  However,  Cohen 
(1973)  found  that  the  pattern  of  speech  errors  induced  via  a  rapid  reading 
procedure  was  of  a  very  similar  nature  to  that  of  a  naturally  collected 
corpus. 

"The  rank  order  of  the  consonant  phonemes  by  the  Dewey  (1923)  count  is 
t>3>3h>ch>th,  whereas  by  the  content  count  it  is  t>s>th>ch>3h. 

®We  are  indebted  to  Marcel  Just  for  making  this  point. 
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Abstract.  Every  language,  spoken  or  signed,  deploys  a  large  lexi¬ 
con,  made  possible  by  permutation  and  combination  of  a  small  set  of 
linguistic  elements.  In  speech,  rapid  interleaving  of  the  gestures 
that  form  these  elements  (consonants  and  vowels)  leads  to  a  complex 
acoustic  signal  in  which  the  boundaries  between  elements  are  lost. 
However,  for  the  child  learning  to  speak,  the  initial  task  is  not  to 
recover  these  elements,  but  simply  to  imitate  the  sound  pattern  that 
it  hears.  Studies  of  "lipreading"  in  adults  and  infants  suggest 
that  Imitation  is  mediated  by  an  amodal  representation,  closely 
related  to  the  dynamics  of  articulation,  and  that  a  left-hemisphere 
perceptuomotor  mechanism  specialized  to  make  use  of  this  representa¬ 
tion  develops  during  the  first  six  months  of  life.  By  drawing  on 
this  specialized  mechanism,  the  Infant  learns  the  recurrent  patterns 
of  acoustic  structure  and  articulatory  gestures  from  which  linguis¬ 
tic  segments  must  be  presumed  to  emerge. 

As  a  system  of  animal  communication,  language  has  the  distinctive  proper¬ 
ty  of  being  open,  that  is,  fitted  to  carrying  messages  on  an  unlimited  range 
of  topics.  Human  cognitive  capacity  is,  of  course,  greater  than  that  of  other 
animals,  but  this  may  be  a  consequence  as  much  as  a  cause  of  linguistic  range. 
Other  primate  communication  systems  have  a  limited  referential  scope — sources 
of  food  or  danger,  personal  and  group  identity,  sexual  inclination,  emotional 
state,  and  so  on — and  a  limited  set  of  no  more  than  10-^40  signals  (Wilson, 
1975,  p.  183).  In  fact,  10-40  holistically  distinct  signals  may  be  close  to 
the  upper  range  of  primate  perceptual  and  motor  capacity.  The  distinctive 
property  of  language  is  that  it  has  finessed  that  upper  limit,  by  developing  a 
double  structure,  or  dual  pattern  (Hockett,  1958). 

The  two  levels  of  patterning  are  phonology  and  syntax.  The  first  permits 
us  to  develop  a  large  lexicon,  the  second  permits  us  to  deploy  the  lexicon  in 
predicating  relations  among  objects  and  events.  My  present  concern  is  entire¬ 
ly  with  the  first  level.  A  six-year-old  middle-class  American  child  already 
recognizes  some  13,000  words  (Templin,  1957),  while  an  adult's  recognition  vo- 
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cabulary  may  be  well  over  100,000,  Every  language,  however  primitive  the 
culture  of  its  speakers  by  Western  standards,  deploys  a  large  lexicon.  This 
is  possible  because  the  phonology,  or  sound  pattern,  of  a  language  draws  on  a 
small  set  (roughly  between  20  and  100  elements)  of  meaningless  units — conso¬ 
nants  and  vowels — to  construct  a  very  large  set  of  meaningful  units,  words  (or 
morphemes).  These  meaningless  units  may  themselves  be  described  in  terms  of  a 
smaller  set  of  recurrent,  contrasting  phonetic  properties  or  features. 
Evidently,  there  emerged  in  our  hominid  ancestors  a  combinatorial  principle 
(later,  perhaps,  extended  into  syntax)  by  which  a  finite  set  of  articulatory 
gestures  could  be  repeatedly  permuted  to  produce  a  very  large  number  of  dis¬ 
tinctively  different  patterns. 

"Articulatory  gesture"  refers,  at  a  gross  level,  to  opening  and  closing 
the  mouth.  Repeated  constriction  of  the  vocal  tract,  somewhere  between  lips 
and  glottis  to  form  consonants,  and  repeated  opening  of  the  tract  by  lowering 
the  jaw  to  form  vowels,  give  rise  to  the  basic  consonant  vowel  syllable  from 
which  the  sound  patterns  of  all  spoken  languages  are  formed.  The  varying 
phonetic  qualities  of  consonants  and  vowels  are  determined  by  the  precise 
shape  of  the  vocal  tract  through  which  sound — the  buzz  of  vocal  fold  vibration 
or  the  hiss  of  air  blown  through  a  narrow  coiiStriction — is  filtered.  The 
shape  of  the  resonating  cavities  of  the  vocal  tract  is  determined  by  fine 
positioning  of  the  articulators:  raising,  lowering,  fronting  or  backing  the 
tip,  blade  or  body  of  the  tongue,  raising  or  lowering  the  velum,  rounding  or 
spreading  the  lips,  and  so  on. 

Thus,  permutation  and  combination  of  some  two  dozen  gestures  provide 
"...a  kind  of  impedance  match  between  an  open-ended  set  of  meaningful  symbols 
and  a  decidedly  limited  set  of  signaling  devices"  (Studdert-Kennedy  &  Lane, 
1980,  p.  35).  Yet  permutation  and  combination  alone  would  not  suffice  for  a 
flexible  and  open-ended  system  of  communication,  if  the  gestures  were  not 
executed  rapidly  enough  to  evade  the  limits  of  short-term  memory  and  to  match 
the  natural  rate  of  thought  and  action. 

What  this  "natural  rate"  may  be  we  do  not  know.  But  for  English,  at 
least,  a  typical  rate  of  speech  is  of  the  order  of  150  words/mln.  This 
reduces  to  roughly  10  to  15  phonemes  (consonants  and  vowels) /s.  As  Cooper  has 
remarked,  such  rates  can  be  achieved  "...only  if  separate  parts  of  the  articu¬ 
latory  machinery — muscles  of  the  lips,  tongue,  velum,  etc. — can  be  separately 
controlled,  and  if... a  change  of  state  for  any  one  of  these  articulatory 
entitles,  taken  together  with  the  current  state  of  others,  is  a  change 

to... another  phoneme. _  It  is  this  kind  of  parallel  processing  that  makes  it 

possible  to  get  high-speed  performance  with  low-speed  machinery"  (Liberman, 
Cooper,  Shankweller,  &  Studdert-Kennedy,  1967,  p.  446).  Thus,  repeated  use  of 
a  small  set  of  Interleaved  gestures  may  not  only  expand  the  potential  lexicon, 
but  also  ensure  rapid  execution  of  its  elements. 

Let  me  conclude  this  brief  introduction  by  noting  that  the  dual  motoric 
structure  of  spoken  language  has  no  known  parallel  in  any  other  system  of  ani¬ 
mal  behavior,  except  manual-facial  sign  languages.  Over  the  past  15-20  years 
we  have  learned  that  American  Sign  Language  (ASL),  the  first  language  of  over 
100,000  deaf  persons,  and  the  fourth  most  common  language  in  the  United  States 
(Mayberry,  1978),  is  a  fully  independent  language  with  its  own  characteristic 
formatlonal  ("phonological")  structure  and  syntax  (Klima  &  Bellugi,  1979). 
Whether  signed  language  is  a  mere  analog  of  spoken  language  or  a  true  homolog, 
drawing  on  the  same  neural  structures,  we  do  not  yet  know — although  studies  of 
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sign  language  deficits  following  left  hemisphere  lesion  reveal  remarkable  par¬ 
allels  with  aphasic  deficits  of  spoken  language  users  (e.g. ,  Kimura,  Battison, 
&  Lubert,  1976;  Poizner,  Bellugi,  &  Iragui,  in  press). 

In  any  event,  my  point  here  is  simply  that  each  ASL  sign  is  formed  by 
combining  four  intrinsically  meaningless  components:  a  hand  configuration,  a 
palm  orientation,  a  place  in  the  body  space  where  it  is  formed,  and  a  move¬ 
ment.  These  four  classes  of  component,  like  the  two  segmental  classes  of  spo¬ 
ken  language  (consonants  and  vowels),  may  also  be  described  in  terms  of  a 
smaller  set  of  recurrent,  contrasting  features  (e.g.,  Klima  &  Bellugi,  1979, 
Chapter  7).  There  are  some  fifty  values,  or  ’/rimes,"  distributed  across  the 
four  dimensions  and  their  combination  in  a  sign  follows  "phonological  rules," 
analogous  to  those  that  constrain  the  structure  of  a  syllable  in  spoken 
languages.  In  short,  both  spoken  and  signed  languages  exploit  combinatorial 
principles  of  lexical  formation.  Moreover,  it  would  seem  that  short-term  mem¬ 
ory  and  cognitive  capacity  have  constrained  signed  and  spoken  languages  to 
similar  rates  of  communication.  For,  although  each  ASL  sign  takes  roughly 
three  times  as  long  to  form  as  an  English  word,  the  proposition  rates  in  the 
two  languages  are  almost  identical  (Klima  &  Bellugi,  1979).  This  is  possible 
because,  while  the  phonological  and  syntactic  structures  of  a  spoken  language 
are  largely  implemented  by  sequential  organization  over  time,  a  signed  lan¬ 
guage  can  exploit  simultaneous  manual  and  facial  gestures  distributed  in 
space.  Thus,  both  types  of  languages  are  grounded  in  a  capacity  for  rapid, 
precise,  and  precisely  coordinated  movements  of  a  small  set  of  articulators. 

In  what  follows,  I  shall  have  little  further  to  say  about  signed 
languages.  Here,  I  simply  note  two  points.  First,  we  do  not  talk  with  our 
toes,  and  we  may  doubt  whether  any  imaginable  system  of  human  articulators, 
other  than  those  of  the  hand  and  mouth,  would  be  capable  of  the  motor  speed 
and  precision  necessary  to  implement  language,  as  we  know  it.  Second, 
whatever  the  evolutionary  sequence  may  have  been,  the  well-established  (albeit 
imperfect)  correlation  between  hemispheric  specializations  for  language  and 
manual  praxis  is,  I  assume,  not  mere  coincidence.  In  all  likelihood,  the  two 
modes  of  language  draw  on  closely  related  neural  structures. 

I  have  dwelt  so  far  on  motor  requirements.  But  there  are  perceptual  de¬ 
mands  also.  If  spoken  language  is  indeed  constructed  from  rapid  sequences  of 
consonants  and  vowels,  the  listener  must  somehow  extract  these  recurrent  ele¬ 
ments  from  the  signal.  Yet,  from  the  earliest  spectrographic  studies  (Joos, 
19^8)  it  has  been  known  that  the  acoustic  flow  of  speech  cannot  be  readily  di¬ 
vided  into  an  alphabetic  sequence  of  invariant  segments  corresponding  to  the 
invariant  segments  of  linguistic  description.  The  reason  for  this  is  simply 
that  we  do  not  speak  segment  by  segment,  or  even  syllable  by  syllable.  At  any 
instant,  the  several  articulators  are  executing  a  complex,  interleaved  pattern 
of  movements,  of  which  the  spatio-temporal  coordinates  reflect  the  Influence 
of  several  neighboring  segments.  (The  reader  may  test  this  by  slowly  utter¬ 
ing,  for  example,  the  words  call  and  keel.  The  reader  will  find  that  the 
position  of  the  tongue  on  the  palate  during  closure  for  the  first  consonant, 
/k/,  is  slightly  farther  back  for  the  first  word  than  for  the  second.)  The 
consequence  of  this  imbricated  pattern  of  movement  is,  of  course,  an  imbricat¬ 
ed  pattern  of  sound,  such  that  any  particular  acoustic  segment  typically  spec¬ 
ifies  more  than  one  linguistic  segment,  while  any  particular  linguistic  seg¬ 
ment  is  specified  by  more  than  one  acoustic  segment  (Fant,  1962;  Liberman  et 
al.,  1967).  This  lack  of  isomorphism  between  acoustic  and  linguistic  struc¬ 
ture  is  the  central  unsolved  problem  of  speech  perception.  Its  continued 
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recalcitrance  is  reflected  in  the  fact  that  we  are  little  closer  to  automatic 
phonetic  transcription  of  speech  now  than  we  were  thirty  years  ago  (Levinson  & 
Liberman,  1981). 

Many  different  approaches  to  the  problem  have  been  proposed,  but  I  will 
not  review  them  here  (see  Studdert-Kennedy,  1980,  for  fuller  discussion). 
Instead,  I  will  attempt  to  recast  the  problem  by  setting  aside,  for  the  mo¬ 
ment,  the  discrepancy  between  acoustic  signal  and  linguistic  description,  and 
simply  asking  what  we  know  about  how  a  child  learns  to  speak,  I  shall  assume 
that,  whatever  the  process,  it  is  sufficiently  general  to  permit  the  deaf 
child  to  learn  to  sign  with  as  much  ease  as  a  hearing  child  learns  to  speak, 
I  note,  further,  that  when  a  child  learns  to  sign  or  speak,  it  learns  a 
specific  dialect.  That  is  to  say,  it  graduaily  discovers,  in  the  detailed 
acoustic  or  optic  patterns  of  its  caretakers'  signals,  specifications  for  a  no 
less  detailed  pattern  of  motor  organization. 

Stated  in  this  way,  the  problem  becomes  a  special  case  of  the  general 
problem  of  imitation.  Relatively  few  species  imitate.  The  higher  primates 
imitate  general  bodily  actions,  but  vocal  imitation  is  peculiar  to  a  few 
species  of  songbirds,  certain  marine  mammals,  and  humans.  The  capacity  to 
imitate  is  evidently  a  rare,  specialized  capacity  for  discovering  links  be¬ 
tween  perceived  movements  and  their  corresponding  motor  controls. 

We  may  gain  insight  into  the  bases  of  speech  imitation  from  recent  stud¬ 
ies  of  "lip-reading"  in  adults  and  infants.  That  adults  can  learn  to  lip-read 
is,  of  course,  a  commonplace  of  aural  rehabilitation,  but  the  theoretical 
Implications  of  this  capacity  have  only  recently  begun  to  emerge.  McGurk  and 
MacDonald  (1976)  demonstrated  that  listeners'  perceptions  of  a  spoken  syllable 
often  change,  if  they  simultaneously  watch  a  video  display  of  a  speaker 
pronouncing  a  different  syllable.  For  example,  if  listeners  are  presented 
with  the  acoustic  syllable  [ba]  repeated  four  times,  while  watching  a  syn¬ 
chronized  optic  display  of  a  speaker  articulating  [ba,  va,  3a,  da],  they  will 
typically  report  the  latter,  optically  specified  sequence.  That  the  effect  is 
not  simply  a  matter  of  visual  dominance  in  a  sensory  hierarchy  (Marks,  1978) 
is  evidenced  by  the  fact  that  certain  combinations  (e.g. ,  acoustic  [ba]  with 
optic  [ga]  may  be  perceived  as  clusters  ([bga]  or  [gba]),  or  even  as  syllables 
corresponding  to  neither  display  ([daj).  Thus  listeners'  percepts  seem  to 
arise  from  a  process  by  which  two  distinct  sources  of  information,  acoustic 
and  optic,  are  actively  combined  at  an  abstract  level  where  each  has  already 
lost  its  distinctive  sensory  quality.  (For  fuller  discussion,  see  Summer- 
field,  1979). 

Further  evidence  for  a  amodal  representation  of  speech  comes  from  a 
cross-modal  study  of  the  so-called  suffix  effect  by  Campbell  and  Dodd  (1980). 
A  standard  finding  of  short-term  memory  studies  is  that  listeners,  recalling  a 
list  of  auditorily  presented  words,  recall  those  at  the  end  better  than  those 
in  the  middle  (recency  effect).  The  effect  is  reduced  if  the  list  is  present¬ 
ed  graphically.  Moreover,  Crowder  and  Morton  (1969)  demonstrated  that  the  ef¬ 
fect  could  be  abolished,  or  significantly  reduced,  if  a  spoken  word  was 
appended  to  the  list,  not  for  recall  but  simply  as  a  signal  to  begin  recall 
(suffix  effect).  Presumably,  the  suffix  "interferes"  in  some  way  with  the 
representation  of  recent  items.  That  this  representation  is  at  some  relative¬ 
ly  "low,"  yet  structured,  level  is  argued  by  the  facts  that  the  effect  (1)  is 
unaffected  by  degree  of  semantic  similarity  between  suffix  and  list,  (2)  is 
reduced  if  suffix  and  list  are  presented  to  opposite  ears,  (3)  does  not  occur 
if  the  suffix  is  a  tone  or  burst  of  noise. 
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Campbell  and  Dodd  (1980)  used  this  paradigm  to  test  listeners'  recall  of 
digits,  either  lip-read  (without  sound)  or  presented  graphically,  with  and 
without  the  spoken  suffix,  "ten"  (heard,  but  not  seen).  They  found  signif¬ 
icant  recency  and  suffix  effects  for  the  lip-read,  but  not  for  the  graphic, 
lists.  In  a  complementary  study,  Spoehr  and  Corin  (1978)  demonstrated  that  a 
lip-read  suffix  reduced  recall  of  auditorily  presented  lists.  Evidently, 
speech  heard,  but  not  seen,  and  speech  seen,  but  not  heard,  share  a  common 
representation.  Moreover,  the  fact  that  Campbell  and  Dodd  did  not  find  a  suf¬ 
fix  effect  for  graphically  presented  lists  suggests  that  this  shared  represen¬ 
tation  is  not  at  some  abstract,  phonological  level  where  spoken  and  written 
language  converge.  Rather,  these  studies,  like  that  of  McGurk  and  MacDonald 
(1976),  hint  at  a  representation  in  some  form  common  to  both  the  light 
reflected  and  the  sound  radiated  from  mouth  and  lips. 

Consider,  now,  that  infants  are  also  sensitive  to  structural  correspond¬ 
ences  between  the  acoustic  and  optic  specifications  of  an  event.  Spelke 
(1976)  showed  that  four-month-old  Infants  preferred  to  watch  the  film  (of  a 
woman  playing  "peekaboo,"  or  of  a  hand  rhythmically  striking  a  wood  block  and 
a  tambourine  with  a  baton)  that  matched  the  sound  track  they  were  hearing. 
Dodd  (1979)  showed  that  four-month-old  infants  watched  the  face  of  a  woman 
reading  nursery  rhymes  more  attentively  when  her  voice  was  synchronized  with 
her  facial  movements  than  when  it  was  delayed  by  400  ms.  If  these  preferences 
were  merely  for  synchrony,  we  might  expect  Infants  to  be  satisfied  with  any 
acoustic-optic  pattern  in  which  moments  of  abrupt  change  are  arbitrarily  syn¬ 
chronized.  Thus,  in  speech  they  might  be  no  less  attentive  to  an  articulating 
face  whose  closed  mouth  was  synchronized  with  syllable  amplitude  peaks  and 
open  mouth  with  amplitude  troughs  than  to  the  (natural)  reverse.  However, 
Kuhl  and  Meltzoff  (1982)  showed  that  four-  to  five-month-old  infants  looked 
longer  at  the  face  of  a  woman  articulating  the  vowel  they  were  hearing  (either 
[i]  or  [a])  than  at  the  same  face  articulating  the  other  vowel  in  synchrony. 
Moreover,  the  preference  disappeared  when  the  signals  were  pure  tones,  matched 
in  amplitude  and  duration  to  the  vowels,  so  that  the  infant  preference  was 
evidently  for  a  match  between  a  mouth  shape  and  a  particular  spectral  struc¬ 
ture.  Similarly,  MacKain  et  al.  (1983)  showed  that  five-  to  six-month-old 
infants  preferred  to  look  at  the  face  of  a  woman  repeating  the  disyllable  they 
were  hearing  (e.g.,  [zuzl])  than  at  the  synchronized  face  of  the  same  woman 
repeating  another  dlsyllable  (e.g.,  [vava]). 

In  both  these  studies,  the  Infants'  preferences  were  for  natural 
structural  correspondences  between  acoustic  and  optic  information.  Both  stud¬ 
ies  hint  at  Infant  sensitivity  to  intermodal  correspondences  that  could  play  a 
role  in  learning  to  speak.  However,  I  am  not  suggesting  that  optic  informa¬ 
tion  is  necessary,  since  the  blind  Infant  also  learns  to  speak. *  My  intent 
rather  is  to  gain  leverage  on  the  puzzle  of  imitation.  What  we  need  therefore 
is  to  establish  that  the  underlying  metric  of  auditory-visual  correspondence 
is  related  to  that  of  the  auditory-motor  correspondence  required  for  an 
individual  to  imitate  the  utterances  of  another. 

To  this  end  we  may  note,  first,  the  visual-motor  link  evidenced  In  the 
capacity  to  imitate  facial  expression  and,  second,  the  association  across  many 
primate  species  between  facial  expression  and  pattern  of  vocalization  (Hooff, 
1976;  Marler,  1975:  Ohala,  1983).  Recently,  Field  et  al.  (1982)  reported  that 
36-hour-old  infants  could  imitate  the  "happy,  sad  and  surprised"  expressions 
of  a  model.  However,  these  are  relatively  stereotyped  emotional  responses 
that  might  be  evoked  without  recourse  to  the  visual-motor  link  required  for 
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imitation  of  novel  movements.  More  striking  is  the  work  of  Meltzoff  and  Moore 
(1977)  who  showed  that  12-  to  21 -day-old  infants  could  imitate  both  arbitrary 
mouth  movements,  such  as  tongue  protrusion  and  mouth  opening,  and  (of  particu¬ 
lar  interest  for  the  acquisition  of  ASL)  arbitrary  hand  movements,  such  as 
opening  and  closing  the  hand  by  serially  moving  the  fingers.  Here  mouth  open¬ 
ing  WPS  elicited  without  vocalization;  but  had  vocalization  occurred,  its 
structure  would,  of  course,  have  reflected  the  shape  of  the  mouth.  Kuhl  and 
Meltzoff  (1982)  do,  in  fact,  report  as  an  incidental  finding  of  their  study 
that  10  of  their  32  four-  to  5-month-old  infants  "...produced  sounds  that  re¬ 
sembled  the  adult  female's  vowels.  They  seemed  to  be  imitating  the  female 
talker,  'taking  turns'  by  alternating  their  vocalizations  with  hers" 
(p.  1140).  If  we  accept  the  evidence  that  the  infants  of  this  study  were 
recognizing  acoustic-optic  correspondences,  and  add  to  it  the  results  of  the 
adult  lipreading  studies,  calling  for  a  metric  in  which  acoustic  and  optic 
information  are  combined,  then  we  may  conclude  that  the  perceptual  structure 
controlling  the  infants'  imitations  was  specified  in  this  common  metric. 

Evidently,  the  desired  metric  must  be  "...closely  related  to  that  of 
articulatory  dynamics"  (Summerf ield,  1979,  p.  329).  Following  Runeson  and 
Frykholm  (1981)  (see  also  Summerfleld,  1980),  we  may  suppose  that  in  the  visu¬ 
al  perception  of  an  event  we  perceive  not  simply  the  surface  kinematics  (dis¬ 
placement,  velocity,  acceleration),  but  also  the  underlying  biophysical  prop¬ 
erties  that  define  the  structure  being  moved  and  the  forces  that  move  it 
(mass,  force,  momentum,  elasticity,  and  so  on).  Similarly,  in  perceiving 
speech,  we  perceive  not  only  its  "kinematics,"  that  is,  the  changes  and  rates 
of  change  in  spectral  structure,  but  also  the  underlying  dynamic  forces  that 
produce  these  changes.  In  other  words,  to  perceive  speech  is  to  perceive 
movements  of  the  articulators,  specified  by  a  pattern  of  radiated  sound,  just 
as  we  perceive  movements  of  the  hand,  specified  by  a  pattern  of  reflected 
light. 

The  close  link,  for  the  infant,  between  perceiving  speech  and  producing 
it,  is  further  suggested  by  a  curious  aspect  of  the  study  by  MacKain  et 
al.  (1983),  cited  earlier.  This  is  the  fact  that  infants'  preferences  for  a 
match  between  the  facial  movements  they  were  watching  and  the  speech  sounds 
they  were  hearing  was  statistically  significant  only  when  they  were  looking  to 
their  right  sides.  Fourteen  of  the  eighteen  infants  in  the  study  preferred 
more  matches  on  their  right  sides  than  on  their  left.  Moreover,  in  a  fol¬ 
low-up  Investigation  of  familial  handedness,  MacKain  and  her  colleagues  have 
learned  that  six  of  the  Infants  have  left-handed  first  or  second  order  rela¬ 
tives.  Of  these  six,  four  are  the  infants  who  displayed  more  left-side  than 
right-side  matches. 

These  results  can  be  interpreted  in  the  light  of  studies  by  Kinsbourne 
and  his  colleagues.  Kinsbourne  (1972)  found  that  right-handed  adults  tended 
to  shift  their  gaze  to  the  right  while  solving  verbal  problems,  to  the  left 
while  visualizing  spatial  relations;  left-handers  tended  to  shift  gaze  in  the 
same  direction  for  both  types  of  task,  with  each  direction  roughly  equally 
represented  across  the  subject  group.  Lempert  and  Kinsbourne  (1982)  showed 
that  the  effect  was  reversible  for  right-handed  subjects  on  a  verbal  task: 
subjects  who  rehearsed  sentences  with  head  and  eyes  turned  right  recalled  the 
sentences  better  than  subjects  who  rehearsed  while  turned  left.  Thus,  atten¬ 
tion  to  one  side  of  the  body  may  facilitate  processes  for  which  the  contralat¬ 
eral  hemisphere  is  specialized. 
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Extending  this  interpretation  to  the  infants  of  MacKain  et  al.  (1983),  we 
may  infer  that  infants  with  a  preference  for  matches  on  the  right  side,  rather 
than  the  left,  were  revealing  a  left  hemisphere  capacity  for  recognizing 
acoustic-optic  correspondence  in  speech.  If,  further,  the  metric  specifying 
these  correspondences  is  the  same  as  that  specifying  the  auditory-motor  corre¬ 
spondences  necessary  for  imitation  (as  was  argued  above),  we  may  conclude  that 
five-  to  six-month-old  infants  already  display  a  speech  perceptuo-motor  link 
in  the  left  hemisphere. 

How  early  this  link  may  develop  we  do  not  yet  know.  However,  Best  et 
al.  (1982),  testing,  two-,  three-,  and  four-month-old  infants  dichotically ,  in 
a  cardiac  habituation  paradigm,  found  a  right-ear  advantage  for  speech  and  a 
left-ear  advantage  for  music  in  the  three-  and  four-month  olds,  but  only  a 
left-ear  advantage  for  music  in  the  two-month  olds.  We  may  suspect,  then, 
that  the  perceptual  component  of  the  speech  link  begins  to  develop  between  the 
second  and  third  months  of  life.  By  five  to  six  months,  close  to  the  onset  of 
babbling,  the  motor  component  is  beginning  to  emerge.  By  the  end  of  the  first 
year,  as  babbling  fades,  the  infant  would  be  equipped  with  the  perceptuo-motor 
mechanisms  necessary  for  imitating  the  sounds  of  the  language  it  is  going  to 
learn. 

In  conclusion,  let  me  recall  the  paradoxical  discrepancy  between  the 
speech  signal  and  its  linguistic  description  with  which  I  began.  The  approach 
to  imitation  I  have  sketched  deliberately  sidesteps  this  problem.  Yet  it  may 
ultimately  contribute  to  its  solution  by  focusing  on  the  infant  for  whom  the 
discrepancy  does  not  yet  exist,  for  the  simple  reason  that  the  infant  has  not 
yet  learned  the  phonetic  categories  of  its  language.  Tracing  the  process  by 
which  the  recurrent  patterns  of  infant  articulation  coalesce  into  categorical 
linguistic  units,  evidenced  by  spoonerisms  and  other  adult  speech  errors 
(Shattuck-Hufnagel,  1979)  is  a  task  for  the  future.  However,  the  task  may  be 
easier,  if  we  see  it  as  a  problem  in  the  development  of  a  unique  mode  of  motor 
control,  characteristic  of  human  language. 
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Footnote 

have  often  heard  it  said  that  blind  children  develop  language  more 
slowly  than  their  sighted  peers,  but  I  know  of  no  systematic  study  on  the  top- 
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Abstract.  A  motor  theory  of  speech  perception,  initially  proposed 
to  account  for  results  of  early  experiments  with  synthetic  speech, 
is  now  extensively  revised  to  accommodate  recent  findings,  and  to 
relate  the  assumptions  of  the  theory  to  those  that  might  be  made 
about  other  perceptual  modes.  According  to  the  revised  theory, 
phonetic  information  is  perceived  in  a  biologically  distinct  system, 
a  "module"  specialized  to  detect  the  intended  gestures  of  the  speak¬ 
er  that  are  the  basis  for  phonetic  categories.  Built  into  the 
structure  of  this  module  is  the  unique  but  lawful  relationship  be¬ 
tween  the  gestures  and  the  acoustic  patterns  in  which  they  are  vari¬ 
ously  overlapped.  In  consequence,  the  module  causes  perception  of 
phonetic  structure  without  translation  from  preliminary  auditory 
impressions.  Thus,  it  is  comparable  to  such  other  modules  as  the 
one  that  enables  an  animal  to  localize  sound.  Peculiar  to  the 
phonetic  module  are  the  relation  between  perception  and  production 
it  incorporates  and  the  fact  that  it  must  compete  with  other  modules 
for  the  same  stimulus  variations. 

Together  with  some  of  our  colleagues,  we  have  long  been  identified  with  a 
view  of  speech  perception  that  is  often  referred  to  as  a  "motor  theory."  Not 
the  motor  theory,  to  be  sure,  because  there  are  other  theories  of  perception 
that,  like  ours,  assign  an  important  role  to  movement  or  its  sources.  But  the 
theory  we  are  going  to  describe  is  only  about  speech  perception,  in  contrast 
to  some  t.  at  deal  with  other  perceptual  processes  (e.g.,  Berkeley,  1709;  Fest- 
inger,  Burnham,  Ono,  &  Bamber,  1967)  or,  indeed,  with  all  of  them  (e.g., 
Washburn,  1926;  Watson,  1919).  Moreover,  our  theory  is  motivated  by 
considerations  that  do  not  necessarily  apply  outside  the  domain  of  speech. 
Yet  even  there  we  are  not  alone,  for  several  theories  of  speech  perception, 
being  more  or  less  "motor,"  resemble  ours  to  varying  degrees  (e.g.,  Chisto- 
vich,  I960;  Dudley,  19^0;  Joos,  19^8;  Ladefoged  &  McKinney,  1963;  Stetson, 
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1951).  However,  it  is  not  relevant  to  our  purposes  to  compare  these,  so,  for 
convenience,  we  will  refer  to  our  motor  theory  as  the  motor  theory. 

We  were  led  to  the  motor  theory  by  an  early  finding  that  the  acoustic 
patterns  of  synthetic  speech  had  to  be  modified  if  an  invariant  phonetic  per¬ 
cept  was  to  be  produced  across  different  contexts  (Cooper,  Delattre,  Liberman, 
Borst,  &  Gerstman,  1952;  Liberman,  Delattre,  &  Cooper,  1952).  Thus,  it  ap¬ 
peared  that  the  objects  of  speech  perception  were  not  to  be  found  at  the 
acoustic  surface.  They  might,  however,  be  sought  in  the  underlying  motor  pro¬ 
cesses,  if  it  could  be  assumed  that  the  acoustic  variability  required  for  an 
invariant  percept  resulted  from  the  temporal  overlap,  in  different  contexts, 
of  correspondingly  invariant  units  of  production.  In  its  most  general  form, 
this  aspect  of  the  early  theory  survives,  but  there  have  been  important  revi¬ 
sions,  including  especially  the  one  that  makes  perception  of  the  motor  invari¬ 
ant  depend  on  a  specialized  phonetic  mode  (Liberman,  1982;  Liberman,  Cooper, 
Shankweiler,  &  Studdert-Kennedy ,  1967;  Liberman  &  Studdert-Kennedy ,  1978;  Mat¬ 
tingly  &  Liberman,  1969).  Our  aim  in  this  paper  is  to  present  further  revi¬ 
sions,  and  so  bring  the  theory  up  to  date. 

The  Theory 

The  first  claim  of  the  motor  theory,  as  revised,  is  that  the  objects  of 
speech  perception  are  the  intended  phonetic  gestures  of  the  speaker, 
represented  in  the  brain  as  invariant  motor  commands  that  call  for  movements 
of  the  articulators  through  certain  linguistically  significant  configurations. 
These  gestural  commands  are  the  physical  reality  underlying  the  traditional 
phonetic  notions — for  example,  "tongue  backing,"  "lip  rounding,"  and  "jaw 
raising" — that  provide  the  basis  for  phonetic  categories.  They  are  the  ele¬ 
mentary  events  of  speech  production  and  perception.  Phonetic  segments  are 
simply  groups  of  one  or  more  of  these  elementary  events;  thus  [b]  consists  of 
a  labial  stop  gesture  and  [m]  of  that  same  gesture  combined  with  a  ve¬ 
lum-lowering  gesture,  Phonologically,  of  course,  the  gestures  themselves  must 
be  viewed  as  groups  of  features,  such  as  "labial,"  "stop,"  "nasal,"  but  these 
features  are  attributes  of  the  gestural  events,  not  events  as  such.  To  per¬ 
ceive  an  utterance,  then,  is  to  perceive  a  specific  pattern  of  intended  ges¬ 
tures  . 

We  have  to  say  "intended  gestures,"  because,  for  a  number  of  reasons 
(coarticulation  being  merely  the  most  obvious),  the  gestures  are  not  directly 
manifested  in  the  acoustic  signal  or  in  the  observable  articulatory  movements. 
It  is  thus  no  simple  matter  (as  we  shall  see  in  a  later  section)  to  define 
specific  gestures  rigorously  or  to  relate  them  to  their  observable  conse¬ 
quences.  Yet,  clearly,  invariant  gestures  of  some  description  there  must  be, 
for  they  are  required,  not  merely  for  our  particular  theory  of  speech  percep¬ 
tion,  but  for  any  adequate  theory  of  speech  production. 

The  second  claim  of  the  theory  is  a  corollary  of  the  first:  if  speech 
perception  and  speech  production  share  the  same  set  of  invariants,  they  must 
be  intimately  linked.  This  link,  we  argue,  is  not  a  learned  association,  a 
result  of  the  fact  that  what  people  hear  when  they  listen  to  speech  is  what 
they  do  when  they  speak.  Rather,  the  link  is  innately  specified,  requiring 
only  epigenetic  development  to  bring  it  into  play.  On  this  claim,  perception 
of  the  gestures  occurs  in  a  specialized  mode,  different  in  important  ways  from 
the  auditory  mode,  responsible  also  for  the  production  of  phonetic  structures, 
and  part  of  the  larger  specialization  for  language.  The  adaptive  function  of 
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the  perceptual  side  of  this  mode,  the  side  with  which  the  motor  theory  is 
directly  concerned,  is  to  make  the  conversion  from  acoustic  signal  to  gesture 
automatically,  and  so  to  let  listeners  perceive  phonetic  structures  without 
mediation  by  (or  translation  from)  the  auditory  appearances  that  the  sounds 
might,  on  purely  psychoacoustic  grounds,  be  expected  to  have. 

A  critic  might  note  that  the  gestures  do  produce  acoustic  signals,  after 
all,  and  that  surely  it  is  these  signals,  not  the  gestures,  which  stimulate 
the  listener's  ear.  What  cah  it  mean,  then,  to  say  it  is  the  gestures,  not 
the  signals,  that  are  perceived?  Our  critic  might  also  be  concerned  that  the 
theory  seems  at  first  blush  to  assign  so  special  a  place  to  speech  as  to  make 
it  hard  to  think  about  in  normal  biological  terms.  We  should,  therefore,  try 
to  forestall  misunderstanding  by  showing  that,  wrong  though  it  may  be,  the 
theory  is  neither  logically  meaningless  nor  biologically  unthinkable. 

An  Issue  That  Any  Theory  of  Speech  Perception  Must  Meet.  The  motor  theo¬ 
ry  would  be  meaningless  if  there  were,  as  is  sometimes  supposed,  a  one-to-one 
relation  between  acoustic  patterns  and  gestures,  for  in  that  circumstance  it 
would  matter  little  whether  the  listener  was  said  to  perceive  the  one  or  the 
other.  Metaphysical  considerations  aside,  the  proximal  acoustic  patterns 
might  as  well  be  the  perceived  distal  objects.  But  the  relation  between  ges¬ 
ture  and  signal  is  not  straightforward.  The  reason  is  that  the  timing  of  the 
articulatory  movements — the  peripheral  realizations  of  the  gestures — is  not 
simply  related  to  the  ordering  of  the  gestures  that  is  implied  by  the  strings 
of  symbols  in  phonetic  transcriptions;  the  movements  for  gestures  Implied  by 
a  single  symbol  are  typically  not  simultaneous,  and  the  movements  implied  by 
successive  symbols  often  overlap  extensively.  This  coarticulation  means  that 
the  changing  shape  of  the  vocal  tract,  and  hence  the  resulting  signal,  is  in¬ 
fluenced  by  several  gestures  at  the  same  time.  Thus,  the  relation  between 
gesture  and  signal,  though  certainly  systematic,  is  systematic  in  a  way  that 
is  peculiar  to  speech.  In  later  sections  of  the  paper  we  will  consider  how 
this  circumstance  bears  on  the  perception  of  speech  and  its  theoretical 
interpretation.  For  now,  however,  we  wish  only  to  justify  consideration  of 
the  motor  theory  by  identifying  it  as  one  of  several  choices  that  the  complex 
relation  between  gesture  and  signal  faces  us  with.  For  this  purpose,  we  will 
describe  just  one  aspect  of  the  relation,  that  we  may  then  use  it  as  an  exam¬ 
ple. 

When  coarticulation  causes  the  signal  to  be  influenced  simultaneously  by 
several  gestures,  a  particular  gesture  will  necessarily  be  represented  by  dif¬ 
ferent  sounds  in  different  phonetic  contexts.  In  a  consonant- vowel  syllable, 
for  example,  the  acoustic  pattern  that  contains  information  about  the  place  of 
constriction  of  the  consonantal  gesture  will  vary  depending  on  the  following 
vowel.  Such  context-conditioned  variation  is  most  apparent,  perhaps,  in  the 
transitions  of  the  formants  as  the  constriction  is  released.  Thus,  place 
information  for  a  given  consonant  is  carried  by  a  rising  transition  in  one 
vowel  context  and  a  falling  transition  in  another  (Liberman,  Delattre,  Cooper, 
&  Gerstman,  1954),  In  isolation,  these  transitions  sound  like  two  different 
glissandi  or  chirps,  which  is  just  what  everything  we  know  about  auditory 
perception  leads  us  to  expect  (Mattingly,  Liberman,  Syrdal,  &  Halwes,  1971); 
they  do  not  sound  alike,  and,  just  as  important,  neither  sounds  like  speech. 
How  is  it,  then,  that,  in  context,  they  nevertheless  yield  the  same  consonant? 
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Auditory  theories  and  the  accounts  they  provide.  The  guiding 
cissumption  of  one  class  of  theories  is  that  ordinary  auditory  processes  are 
sufficient  to  explain  the  perception  of  speech;  there  is  no  need  to  invoke  a 
further  specialization  for  language,  certainly  not  one  that  gives  the  listener 
access  to  gestures.  The  several  members  of  this  class  differ  in  principle, 
though  they  are  often  combined  in  practice. 

One  member  of  the  class  counts  two  stages  in  the  perceptual  process:  a 
first  stage  in  which,  according  to  principles  that  apply  to  the  way  we  hear 
all  sounds,  the  auditory  appearances  of  the  acoustic  patterns  are  registered, 
followed  by  a  second  stage  in  which,  by  an  act  of  sorting  or  matching  to 
prototypes,  phonetic  labels  are  affixed  (Crowder  &  Morton,  1969;  Fujisaki  & 
Kawashima,  1970;  Oden  &  Massaro,  1978;  Pisoni,  1973).  Just  why  such  different 
acoustic  patterns  as  the  rising  and  falling  transitions  of  our  example  deserve 
the  same  label  is  not  explicitly  rationalized,  it  being  accounted,  presumably, 
a  characteristic  of  the  language  that  the  processes  of  sorting  or  matching  are 
able  to  manage.  Nor  does  the  theory  deal  with  the  fact  that,  in  appropriate 
contexts,  these  transitions  support  phonetic  percepts  but  do  not  also  produce 
such  auditory  phenomena  as  chirps.  To  the  contrary,  indeed,  it  is  sometimes 
made  explicit  that  the  auditory  stage  is  actually  available  for  use  in 
discrimination.  Such  availability  is  not  always  apparent  because  the  casual 
(or  forgetful)  listener  is  assumed  to  rely  on  the  categorical  labels,  which 
persist  in  memory,  rather  than  on  the  context-sensitive  auditory  impressions, 
which  do  not;  but  training  or  the  use  of  more  sensitive  psychophysical  methods 
is  said  to  give  better  access  to  the  auditory  stage  and  thus  to  the  stimulus 
variations — including,  presumably,  the  differences  in  formant  transition — that 
the  labels  ignore  (Carney,  Widin,  &  Viemeister,  1977;  Pisoni  &  Tash,  197**; 
Samuel,  1977). 

Another  member  of  the  class  of  auditory  theories  avoids  the  problem  of 
context-conditioned  variation  by  denying  its  importance.  According  to  this 
theory,  speech  perception  relies  on  there  being  at  least  a  brief  period  during 
each  speech  sound  when  its  short-time  spectrum  is  reliably  distinct  from  those 
of  other  speech  sounds.  For  an  initial  stop  in  a  stressed  syllable,  for  exam¬ 
ple,  this  period  includes  the  burst  and  the  first  10  ms  after  the  onset  of 
voicing  (Stevens  &  Blumstein,  1978).  That  a  listener  is  nevertheless  able  to 
identify  speech  sounds  from  which  these  invariant  attributes  have  been  removed 
is  explained  by  the  claim  that,  in  natural  speech,  they  are  sometimes  missing 
or  distorted,  so  that  the  child  must  learn  to  make  use  of  secondary,  con¬ 
text-conditioned  attributes,  such  as  formant  transitions,  which  ordinarily 
co-occur  with  the  primary,  invariant  attributes  (Cole  &  Scott,  1974).  Thus, 
presumably,  the  different-sounding  chirps  develop  in  perception  to  become  the 
same-sounding  (non-chirpy)  phonetic  element  with  which  they  have  been 
associated. 

The  remaining  member  of  this  class  of  theories  is  the  most  thoroughly  au¬ 
ditory  of  all.  By  its  terms,  the  very  processes  of  phonetic  classification 
depend  directly  on  properties  of  the  auditory  system,  properties  so  indepen¬ 
dent  of  language  as  to  be  found,  perhaps,  in  all  mammals  (Kuhl,  1981;  Miller, 
1977;  Stevens,  1975).  As  described  most  commonly  in  the  literature,  this  ver¬ 
sion  of  the  auditory  theory  takes  the  perceived  boundary  between  one  phonetic 
category  and  another  to  correspond  to  a  naturally-occurring  discontinuity  in 
perception  of  the  relevant  acoustic  continuum.  There  is  thus  no  first  stage 
in  which  the  (often)  different  auditory  appearances  are  available,  nor  is 
there  a  process  of  learned  equivalence.  An  example  is  the  claim  that  the 
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distinction  between  voiced  and  voiceless  stops — normally  cued  by  a  complex  of 
acoustic  differences  caused  by  differences  in  the  phonetic  variable  known  as 
voice-onset-time — depends  on  an  auditory  discontinuity  in  sensitivity  to  tem¬ 
poral  relations  among  components  of  the  signal  (Kuhl  &  Miller,  1975;  Pisoni, 
1977).  Another  is  the  suggestion  that  the  boundary  between  fricative  and 
affricate  on  a  rise-time  continuum  is  the  same  as  the  rise-time  boundary  in 
the  analogous  nonspeech  case — that  is,  the  bouhdary  that  separates  the  non¬ 
speech  percepts  "pluck"  and  "bow"  (Cutting  &  Rosner,  197^;  but  see  Rosen  &  Ho¬ 
well,  1981).  To  account  for  the  fact  that  such  discontinuities  move  as  a 
function  of  phonetic  context  or  rate  of  articulation,  one  can  add  the  assump¬ 
tion  tha .  the  several  components  of  the  acoustic  signal  give  rise  to  interac¬ 
tions  of  a  purely  auditory  sort  (Hillenbrand,  ^98‘^:  but  see  Summerfield, 
1982).  As  for  the  rising  and  falling  formant  transitions  of  our  earlier  exam¬ 
ple,  some  such  assumption  of  auditory  interaction  (between  the  transitions  and 
the  remainder  of  the  acoustic  pattern)  would  presumably  be  offered  to  account 
for  the  fact  that  they  sound  like  two  different  glissandi  in  isolation,  but  as 
the  same  ( non-glissando-like)  consonant  in  the  context  of  the  acoustic  syll¬ 
able.  The  clear  implication  of  this  theory  is  that,  for  all  phonetic  contexts 
and  for  every  one  of  the  many  acoustic  cues  that  are  known  to  be  of  conse¬ 
quence  for  each  phonetic  segment,  the  motivation  for  articulatory  and 
coarticulatory  maneuvers  is  to  produce  just  those  acoustic  patterns  that  fit 
the  language-independent  characteristics  of  the  auditory  system.  Thus,  this 
last  auditory  theory  is  auditory  in  two  ways:  speech  perception  is  governed 
by  auditory  principles,  and  so,  too,  is  speech  production. 

The  account  provided  by  the  motor  theory.  The  motor  theory  offers  a 
view  radically  different  from  the  auditory  theories,  most  obviously  in  the 
claim  that  speech  perception  is  not  to  be  explained  by  principles  that  apply 

to  perception  of  sounds  in  general,  but  must  rather  be  seen  as  a  specializa¬ 

tion  for  phonetic  gestures.  Incorporating  a  biologically  based  link  between 
perception  and  production,  this  specialization  prevents  listeners  from  hearing 
the  signal  as  an  ordinary  sound,  but  enables  them  to  use  the  systematic,  yet 
special,  relation  between  signal  and  gesture  to  perceive  the  gesture.  The  re¬ 
lation  is  systematic  because  it  results  from  lawful  dependencies  among  ges¬ 
tures,  articulator  movements,  vocal-tract  shapes,  and  signal.  It  is  special 
because  it  occiurs  only  in  speech. 

Applying  the  motor  theory  to  our  example,  we  suggest  what  has  seemed 

obvious  since  the  importance  of  the  transitions  was  discovered:  the  listener 

uses  the  systematically  varying  transitions  as  information  about  the  coarticu¬ 
lation  of  an  invariant  consonant  gesture  with  various  vowels,  and  so  perceives 
this  gesture.  Perception  requires  no  arbitrary  association  of  signal  with 
phonetic  category,  and  no  correspondingly  arbitrary  progression  from  an  audi¬ 
tory  stage  (e.g.,  different  sounding  glissandi)  to  a  superseding  phonetic  la¬ 
bel.  As  Studdert-Kennedy  (1976)  has  put  it,  the  phonetic  category  "names  it¬ 
self." 

By  way  of  comparison  with  the  last  of  the  auditory  theories  we  described, 
we  note  that,  just  as  this  theory  is  in  two  ways  auditory,  the  motor  theory  is 
in  two  ways  motor.  First,  because  it  takes  the  proper  object  of  phonetic 
perception  to  be  a  motor  event.  And,  second,  because  it  assumes  that  adapta¬ 
tions  of  the  motor  system  for  controlling  the  organs  of  the  vocal  tract  took 
precedence  in  the  evolution  of  speech.  These  adaptations  made  it  possible, 
not  only  to  produce  phonetic  gestures,  but  also  to  coarticulate  them  so  that 
they  could  be  produced  rapidly.  A  perceiving  system,  specialized  to  take  ac- 
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count  of  the  complex  acoustic  consequences,  developed  concomitantly.  Accord¬ 
ingly,  the  theory  is  not  indifferently  perceptual  or  motor,  implying  simply 
that  the  basis  of  articulation  and  the  object  of  perception  are  the  same. 
Rather,  the  emphasis  is  quite  one-sided;  therefore,  the  theory  fully  deserves 
the  epithet  "motor." 

How  the  Motor  Theory  Makes  Speech  Perception  Like  Other  Specialized 
Perceiving  Systems.  The  specialized  perceiving  system  that  the  motor  theory 
assumes  is  not  unique;  it  is,  rather,  one  of  a  rather  large  class  of  special 
systems  or  "modules."  Accordingly,  one  can  think  about  it  in  familiar  biolog¬ 
ical  terms.  Later,  we  will  consider  more  specifically  how  the  phonetic  module 
fits  the  concept  of  modularity  developed  recently  by  Fodor  (1983);  our  concern 
now  is  only  to  compare  the  phonetic  module  with  others. 

The  modules  we  refer  to  have  in  common  that  they  are  special  neural 
structures,  designed  to  take  advantage  of  a  systematic  but  unique  relation  be¬ 
tween  a  proximal  display  at  the  sense  organ  and  some  property  of  a  distal  ob¬ 
ject.  A  result  in  all  cases  is  that  there  is  not,  first,  a  cognitive 
representation  of  the  proximal  pattern  that  is  modality-general,  followed  by 
translation  to  a  particular  distal  property;  rather,  perception  of  the  distal 
property  is  immediate,  which  is  to  say  that  the  module  has  done  all  the  hard 
work.  Consider  auditory  localization  as  an  example.  One  of  several  cues  is 
differences  in  time  of  arrival  of  particular  frequency  components  of  the  sig¬ 
nal  at  the  two  ears  (see  Hafter,  198^1,  for  a  review).  No  one  would  claim  that 
the  use  of  this  cue  is  part  of  the  general  auditory  ability  to  perceive,  as 
such,  the  size  of  the  time  interval  that  separates  the  onsets  of  two  different 
signals.  Certainly,  this  kind  of  general  auditory  ability  does  exist,  but  it 
is  no  part  of  auditory  localization,  either  psychologically  or  physiological¬ 
ly.  Animals  perceive  the  location  of  sounding  objects  only  by  means  of  neural 
structures  specialized  to  take  advantage  of  the  systematic  but  special  rela¬ 
tion  between  proximal  stimulus  and  distal  location  (see,  for  example,  Knudsen, 
1984).  The  relation  is  systematic  for  obvious  reasons;  it  is  special  because 
it  depends  on  the  circumstance  that  the  animal  has  two  ears,  and  that  the  ears 
are  set  a  certain  distance  apart.  In  the  case  of  the  human,  the  only  species 
for  which  the  appropriate  test  can  be  made,  there  is  no  translation  from  pe; — 
ceived  disparity  in  time  because  there  is  no  perceived  disparity. 

Compare  this  with  the  voicing  distinction  (e.g. ,  [ba]  vs.  [pa])  referred 
to  earlier,  which  is  cued  in  part  by  a  difference  in  time  of  onset  of  the  sev¬ 
eral  formants,  and  which  has  therefore  been  said  by  some  to  rest  on  a  general 
auditory  ability  to  perceive  temporal  disparity  as  such  (Kuhl  &  Miller,  1975; 
Pisoni,  1977).  We  believe,  to  the  contrary,  that  the  temporal  disparity  is 
only  the  proximal  occasion  for  the  unroediated  perception  of  voicing,  a  distal 
gesture  represented  at  the  level  of  articulation  by  the  relative  timing  of  vo¬ 
cal-tract  opening  and  start  of  laryngeal  vibration  (Lisker  &  Abramson,  1964). 
So  we  should  expect  perceptual  judgments  of  differences  in  signal  onset-time 
to  have  no  more  relevance  to  the  voicing  distinction  than  to  auditory  locali¬ 
zation.  In  neither  case  do  general  auditory  principles  and  procedures 
enlighten  us.  Nor  does  it  help  to  invoke  general  principles  of  auditory 
interaction.  The  still  more  general  principle  that  perception  gives  access  to 
distal  objects  tells  us  only  that  auditory  localization  and  speech  perception 
work  as  they  are  supposed  to;  it  does  not  tell  us  how.  Surely  the  "how"  is  to 
be  found,  not  by  studying  perception,  even  auditory  perception,  in  general, 
but  only  by  studying  auditory  localization  and  speech  perception  in  particu¬ 
lar.  Both  are  special  systems;  they  are,  therefore,  to  be  understood  only  in 
their  own  terms. 
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Examples  of  such  biologically  specialized  perceiving  modules  can  be 
multiplied.  Visual  perception  of  depth  by  use  of  information  about  binocular 
disparity  is  a  well-studied  example  that  has  the  same  general  characteristics 
we  have  attributed  to  auditory  localization  and  speech  (Julesz,  I960,  1971; 
Poggio,  19811).  And  there  is  presumably  much  to  be  learned  by  comparison  with 
such  biologically  coherent  systems  as  those  that  underlie  echolocation  in  bats 
(Suga,  19814)  or  song  in  birds  (Marler,  1970;  Thorpe,  1958).  But  we  will  not 
elaborate,  for  the  point  to  be  made  here  is  only  that,  from  a  biological  point 
of  view,  the  assumptions  of  the  motor  theory  are  not  bizarre. 

How  the  Motor  Theory  Makes  Speech  Perception  Different  from  Other  Spe¬ 
cialized  Perceiving  Systems.  Perceptual  modules,  by  definition,  differ  from 
one  another  in  the  classes  of  distal  events  that  form  their  domains  and  in  the 
relation  between  these  events  and  the  proximal  displays.  But  the  phonetic 
module  differs  from  others  in  at  least  two  further  respects. 

Auditory  and  phonetic  domains.  The  first  difference  is  in  the 

locale  of  the  distal  events.  In  auditory  localization,  the  distal  event  is 

"out  there,"  and  the  relation  between  it  and  the  proximal  display  at  the  two 
ears  is  completely  determined  by  the  principles  of  physical  acoustics.  Much 
the  same  can  be  said  of  those  specialized  modules  that  deal  with  the 

primitives  of  auditory  quality,  however  they  are  to  be  characterized,  and  that 
cone  into  play  when  people  perceive,  for  example,  whistles,  horns,  breaking 

glass,  and  barking  dogs.  Not  so  for  the  perception  of  phonetic  structure. 

There,  the  distal  object  is  a  phonetic  gesture  or,  more  explicitly,  an 

"upstream"  neural  command  for  the  gesture  from  which  the  peripheral  articula¬ 
tory  movements  unfold.  It  follows  that  the  relation  between  distal  object  and 
proximal  stimulus  will  have  the  special  feature  that  it  is  determined  not  just 
by  acoustic  principles  but  also  by  neuromuscular  processes  internal  to  the 
speaker.  Of  course,  analogues  of  these  processes  are  also  available  as  part 
of  the  biological  endowment  of  the  listener.  Hence,  some  kind  of  link  between 
perception  and  production  would  seem  to  characterize  the  phonetic  module,  but 
not  those  modules  that  provide  auditory  localization  or  visual  perception  of 
depth.  In  a  later  section,  we  will  have  more  to  say  about  this  link.  Now  we 

will  only  comment  that  it  may  conceivably  resemble,  in  its  most  general 

characteristics,  those  links  that  have  been  identified  in  the  communication 
modules  of  certain  nonhuman  creatures  (Gerhardt  &  Rheinlaender,  1982;  Hoy, 
Hahn,  &  Paul,  1977:  Hoy  &  Paul,  1973;  Katz  &  Gurney,  1981;  Margolish,  1983: 
McCasland  &  Konishi,  1983;  Nottebohm,  Stokes,  &  Leonard,  1976;  Williams, 
19814). 

The  motor  theory  aside,  it  is  plain  that  speech  somehow  informs  listeners 
about  the  phonetic  intentions  of  the  talker.  The  particular  claim  of  the  mo¬ 
tor  theory  is  that  these  intentions  are  represented  in  a  specific  form  in  the 
talker's  brain,  and  that  there  is  a  perceiving  module  specialized  to  lead  the 
listener  effortlessly  to  that  representation.  Indeed,  what  is  true  of  speech 
in  this  respect  is  true  for  all  of  language,  except,  of  course,  that  the  more 
distal  object  for  language  is  some  representation  of  linguistic  structure,  not 
merely  of  gesture,  and  that  access  to  this  object  requires  a  module  that  is 
not  merely  phonetic,  but  phonological  and  syntactic  as  well. 

Competition  between  phonetic  and  auditory  modes.  A  second  important 
difference  between  the  phonetic  module  and  the  others  has  to  do  with  the  ques¬ 
tion:  how  does  the  module  cooperate  or  compete  with  others  that  use  stimuli 
of  the  same  broadly  defined  physical  form?  For  auditory  localization,  the  key 
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to  the  answer  is  the  fact  that  the  module  is  turned  on  by  a  specific  and 
readily  specifiable  characteristic  of  the  proximal  stimulus:  a  particular 
range  of  differences  in  time  of  arrival  at  the  two  ears.  Obviously,  such 
differences  have  no  other  utility  for  the  perceiver  but  to  provide  information 
about  the  distal  property,  location;  there  are  no  imaginable  ecological  cir¬ 
cumstances  in  which  a  person  could  use  this  characteristic  of  the  proximal 
stimuli  to  specify  some  other  distal  property.  Thus,  the  proximal  display  and 
the  distal  property  it  specifies  only  canplement  the  other  aspects  of  what  a 
listener  hears;  they  never  compete. 

In  phonetic  perception,  things  are  quite  different  because  important 
acoustic  cues  are  often  similar  to,  even  identical  with,  the  stimuli  that  in¬ 
form  listeners  about  a  variety  of  nonspeech  events.  We  have  already  remarked 
that,  in  isolation,  formant  transitions  sound  like  glissandi  or  chirps.  Now 
surely  we  don't  want  to  perceive  these  as  glissandi  or  chirps  when  we  are 
listening  to  speech,  but  we  do  want  to  perceive  them  so  when  we  are  listening 
to  music  or  to  birdsong.  If  this  is  true  for  all  of  the  speech  cues,  as  in 
some  sense  it  presumably  is,  then  it  is  hard  to  see  how  the  module  can  be 
turned  on  by  acoustic  stigmata  of  any  kind — that  is,  by  some  set  of  necessary 
cues  defined  in  purely  acoustic  terms.  We  will  consider  this  matter  in  some 
greater  detail  later.  For  now,  however,  the  point  is  only  that  cues  known  to 
be  of  great  importance  for  phonetic  events  may  be  cues  for  totally  unrelated 
nonphonetic  events,  too.  A  consequence  is  that,  in  contrast  to  the  generally 
complem.-ntary  relation  of  the  several  modules  that  serve  the  same  broadly  de¬ 
fined  modality  (e.g.,  depth  and  color  in  vision),  the  phonetic  and  auditory 
modules  are  in  direct  competition.  (For  a  discussion  of  how  this  competition 
might  be  resolved,  see  Mattingly  &  Liberman,  1985.) 

Experimental  Evidence  for  the  Theory 

Having  briefly  described  one  motive  for  the  motor  theory — the  con¬ 
text-conditioned  variation  in  the  acoustic  cues  for  constant  phonetic  categor¬ 
ies — we  will  now  add  others.  We  will  limit  ourselves  to  the  so-called  segmen¬ 
tal  aspects  of  phonetic  structure,  though  the  theory  ought,  in  principle,  to 
apply  in  the  suprasegmental  domain  as  well  (cf.  Fowler,  1982). 

The  two  parts  of  the  theory — that  gestures  are  the  objects  of  perception 
and  that  perception  of  these  gestures  depends  on  a  specialized  module — might 
be  taken  to  be  independent,  as  they  were  in  their  historical  development,  but 
the  relevant  data  are  not.  We  therefore  cannot  rationally  apportion  the  data 
between  the  parts,  but  must  rather  take  them  as  they  come. 

A  result  of  articulation;  The  multiplicity,  variety,  and  equivalence  of 
cues  for  each  phonetic  percept.  When  speech  synthesis  began  to  be  used  as  a 
tool  to  Investigate  speech  perception,  it  was  soon  discovered  that,  in  any 
specific  context,  a  particular  local  property  of  the  acoustic  signal  was 
sufficient  for  the  perception  of  one  phonetic  category  rather  than  another 
and,  more  generally,  that  the  percept  could  be  shifted  along  some  phonetic  di¬ 
mension  by  varying  the  synthetic  stimulus  along  a  locally-def inable  acoustic 
dimension.  For  example,  if  the  onset  frequency  of  the  transition  of  the  sec¬ 
ond  formant  during  a  stop  release  is  sufficiently  low,  relative  to  the  fre¬ 
quency  of  the  following  steady  state,  the  stop  is  perceived  as  labial;  other¬ 
wise,  as  apical  or  dorsal  (Liberman  et  al.,  195^).  A  value  along  such  an 
acoustic  dimension  that  was  optimal  for  a  particular  phonetic  category,  or, 
more  loosely,  the  dimension  itself,  was  termed  an  "acoustic  cue." 


Liberman  &  Mattingly:  The  Motor  Theory  of  Speech  Perception  evised 


Of  course,  the  fact  that  particular  acoustic  cues  can  be  isolated  must, 
of  itself,  tell  us  something  about  speech  perception,  for  it  might  have  been 
otherwise.  Thus,  it  is  possible  to  imagine  a  speech-perception  mechanism, 
equipped,  perhaps,  with  auditory  templates,  that  would  break  down  if  presented 
with  anything  other  than  a  wholly  natural  and  phonetically  optimal  stimulus. 
Listeners  would  either  give  conflicting  and  unreliable  phonetic  judgments  or 
else  not  hear  speech  at  all.  Clearly,  the  actual  mechanism  is  not  of  this 
kind,  and  the  concept  of  cue  accords  with  this  fact. 

Nevertheless,  the  emphasis  on  the  cues  has,  perhaps,  been  unfortunate, 
for  the  term  "cue"  might  seem  to  imply  a  claim  about  the  elemental  units  of 
speech  perception.  But  "cue"  was  simply  a  convenient  bit  of  laboratory  jargon 
referring  to  acoustic  variables  whose  definition  depended  very  much  on  the  de¬ 
sign  features  of  the  particular  synthesizers  that  were  used  to  study  them. 
The  cues,  as  such,  have  no  role  in  a  theory  of  speech  perception;  they  only 
describe  some  of  the  facts  on  which  a  theory  might  be  based  (cf.  Bailey  & 
Summerfield,  1980).  There  are,  indeed,  several  generalizations  about  the 
cues — seme  only  hinted  at  by  the  data  now  available,  others  quite  well  found¬ 
ed — that  are  relevant  to  such  a  theory. 

One  such  generalization  is  that  every  "potential"  cue — that  is,  each  of 
the  many  acoustic  events  peculiar  to  a  linguistically  significant  gesture — is 
an  actual  cue.  (For  example,  every  one  of  eighteen  potential  cues  to  the 
voicing  distinction  in  medial  position  has  been  shown  to  have  some  perceptual 
value;  Lisker,  1978.)  All  possible  cues  have  not  been  tested,  and  probably 
never  will  be,  but  no  potential  cue  has  yet  been  found  that  could  not  be  shown 
to  be  an  actual  one. 

A  closely  related  generalization  is  that,  while  each  cue  is,  by  defini¬ 
tion,  more  or  less  sufficient,  none  is  truly  necessary.  The  absence  of  any 
single  cue,  no  matter  how  seemingly  characteristic  of  the  phonetic  category, 
can  be  compensated  for  by  others,  not  without  some  cost  to  naturalness  or  even 
intelligibility,  perhaps,  but  still  to  such  an  extent  that  the  intended  cate¬ 
gory  is,  in  fact,  perceived.  Thus,  stops  can  be  perceived  without  silent  pe¬ 
riods,  fricatives  without  frication,  vowels  without  formants,  and  tones  with¬ 
out  pitch  (Abramson,  1972;  Inoue,  198*1;  Remez  &  Rubin,  198*4;  Repp,  198*4; 
Yeni-Komshian  &  Soli,  1981). 

Yet  another  generalization  is  that  even  when  several  cues  are  present, 
variations  in  one  can,  within  limits,  be  compensated  for  by  offsetting  varia¬ 
tions  in  another  (Dorman,  Raphael,  &  Liberman,  1979:  Dorman,  Studdert -Kennedy , 
&  Raphael,  1977:  Hoffman,  1958:  Howell  &  Rosen,  1983:  Lisker,  1957:  Summer- 
field  &  Haggard,  1977).  In  the  case  of  the  contrast  between  fricative-vowel 
and  fricative-stop-vowel  (as  in  [sa]  vs.  [sta]),  investigators  have  found  that 
two  important  cues,  silence  and  appropriate  formant  transitions,  engage  in 
just  such  a  trading  relation.  That  this  bespeaks  a  true  equivalence  in 
perception  was  shown  by  experiments  in  which  the  effect  of  variation  in  one 
cue  could,  depending  on  its  "direction,"  be  made  to  "add  to"  or  "cancel  out" 
the  effect  of  the  other  (Fitch,  Halwes,  Erickson,  &  Liberman,  1980).  Signif¬ 
icantly,  this  effect  can  also  be  obtained  with  sine-wave  analogues  of  speech, 
but  only  for  subjects  who  perceive  these  signals  as  speech,  not  for  those  who 
perceive  them  as  nonspeech  tones  (Best,  Morrongiello,  &  Robson,  1981). 
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Putting  together  all  the  generalizations  about  the  multiplicity  and  vari¬ 
ety  of  acoustic  cues,  we  should  conclude  that  there  is  simply  no  way  to  define 
a  phonetic  category  in  purely  acoustic  terms.  A  complete  list  of  the 
cues — surely  a  cumbersome  matter  at  best — is  not  feasible,  for  it  would 
necessarily  include  all  the  acoustic  effects  of  phonetically  distinctive 
articulations.  But  even  if  it  were  possible  to  compile  such  a  list,  the  re¬ 
sult  would  not  repay  the  effort,  because  none  of  the  cues  on  the  list  could  be 
deemed  truly  essential.  As  for  those  cues  that  might,  for  any  reason,  be  fi¬ 
nally  included,  none  could  be  assigned  a  characteristic  setting,  since  the  ef¬ 
fect  of  changing  it  could  be  offset  by  appropriate  changes  in  one  or  more  of 
the  others.  This  surely  tells  us  something  about  the  design  of  the  phonetic 
module.  For  if  phonetic  categories  were  acoustic  patterns,  and  if,  according¬ 
ly,  phonetic  perception  were  properly  auditory,  one  should  be  able  to  describe 
quite  straightforwardly  the  acoustic  basis  for  the  phonetic  category  and  its 
associated  percept.  According  to  the  motor  theory,  by  contrast,  one  would  ex¬ 
pect  the  acoustic  signal  to  serve  only  as  a  source  of  information  about  the 
gestures;  hence  the  gestures  would  properly  define  the  category.  As  for  the 
perceptual  equivalence  among  diverse  cues  that  is  shown  by  the  trading  rela¬ 
tions,  explaining  that  on  auditory  grounds  requires  ad  hoc  assumptions.  But 
if,  as  the  motor  theory  would  have  it,  the  gesture  is  the  distal  object  of 
perception,  we  should  not  wonder  that  the  several  sources  of  information  about 
it  are  perceptually  equivalent,  for  they  are  products  of  the  same  linguisti¬ 
cally  significant  gesture. 

A  result  of  coarticulation;  j[.  Segmentation  in  sound  and  percept. 
Traditional  phonetic  transcription  represents  utterances  as  single  linear  se¬ 
quences  of  symbols,  each  of  which  stands  for  a  phonetic  category.  It  is  an 
issue  among  phonologists  whether  such  transcriptions  are  really  theoretically 
adequate,  and  various  alternative  proposals  have  been  made  in  an  effort  to 
provide  a  better  account.  This  matter  need  not  concern  us  here,  however, 
since  all  proposals  have  in  common  that  phonetic  units  of  some  description  are 
ordered  from  left  to  right.  Some  sort  of  segmentation  is  thus  always  implied, 
and  what  theory  must  take  into  account  is  that  the  perceived  phonetic  object 
is  thus  segmented. 

Segmentation  of  the  phonetic  percept  would  be  no  problem  for  theory  if 
the  proximal  sound  were  segmented  correspondingly.  But  it  is  not,  nor  can  it 
be,  if  speech  is  to  be  produced  and  perceived  efficiently.  To  maintain  a 
straightforward  relation  in  segmentation  between  phonetic  unit  and  signal 
would  require  that  the  sets  of  phonetic  gestures  corresponding  to  phonetic 
units  be  produced  one  at  a  time,  each  in  its  turn.  The  obvious  consequence 
would  be  that  each  unit  would  becone  a  syllable,  in  which  case  talkers  could 
speak  only  as  fast  as  they  could  spell,  A  function  of  coarticulation  is  to 
evade  this  limitation.  There  is  an  important  consequence,  however,  which  is 
that  there  is  now  no  straightforward  correspondence  in  segmentation  between 
the  phonetic  and  acoustic  representations  of  the  information  (Fant,  1962; 
Joos,  19^8).  Thus,  the  acoustic  information  for  any  particular  phonetic  unit 
is  typically  overlapped,  often  quite  thoroughly,  with  information  for  other 
units.  Moreover,  the  span  over  which  that  information  extends,  the  amount  of 
overlap,  and  the  number  of  units  signalled  within  the  overlapped  portion  all 
vary  according  to  the  phonetic  context,  the  rate  of  articulation,  and  the  lan¬ 
guage  (Magen,  198^4;  Manuel  &  Krakow,  198J4;  Ohman,  1966;  Recasens,  198^;  Repp, 
Liberman,  Eccardt,  &  Pesetsky,  1978;  Tuller,  Harris,  &  Kelso,  1982). 
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There  are,  perhaps,  occasional  stretches  of  the  acoustic  signal  over 
which  there  is  information  about  only  one  phonetic  unit  —  for  example,  in  the 
middle  of  the  frication  in  a  slowly  articulated  fricative-vowel  syllable  and 
in  vowels  that  are  sustained  for  artificially  long  times.  Such  stretches  do, 
of  course,  offer  a  relation  between  acoustic  patterns  and  phonetic  units  that 
would  be  transparent  if  phonetic  perception  were  merely  auditory.  But  even  in 
these  cases,  the  listener  automatically  takes  account  of,  not  just  the  trans¬ 
parent  part  of  the  signal,  but  the  regions  of  overlap  as  well  (Mann  &  Repp, 
1980,  1981;  Whalen,  1981).  Indeed,  the  general  rule  may  be  that  the  phonetic 
percept  is  normally  made  available  to  consciousness  only  after  all  the  rele¬ 
vant  acoustic  information  is  in,  even  when  earlier  cues  might  have  been  suffi¬ 
cient  (Martin  &  Bunell,  1981,  1982:  Repp  et  al.,  1978). 

What  wants  explanation,  then,  is  that  the  percept  is  segmented  in  a  way 
that  the  signal  is  not,  or,  to  put  it  another  way,  that  the  percept  does  not 
mirror  the  overlap  of  information  in  the  sound  (cf.  Fowler,  1984).  The  motor 
theory  does  not  provide  a  complete  explanation,  certainly  not  in  its  present 
state,  but  it  does  head  the  theoretical  enterprise  in  the  right  direction.  At 
the  very  least,  it  turns  the  theorist  away  from  the  search  for  those  unlikely 
processes  that  an  auditory  theory  would  suggest:  How  listeners  learn  phonetic 
labels  for  what  they  hear  and  thus  re-interpret  perceived  overlap  as  sequences 
of  discrete  units;  or  how  discrete  units  emerge  in  perception  from  interac¬ 
tions  of  a  purely  auditory  sort.  The  first  process  seems  implausible  on  its 
face,  the  second  because  it  presupposes  that  the  function  of  the  many  kinds 
and  degrees  of  coarticulation  is  to  produce  just  those  combinations  of  sounds 
that  will  interact  in  accordance  with  language-independent  characteristics  of 
the  auditory  system.  In  contrast,  the  motor  theory  begins  with  the  assumption 
that  coarticulation,  and  the  resulting  overlap  of  phonetic  information  in  the 
acoustic  pattern,  is  a  consequence  of  the  efficient  processes  by  which  dis¬ 
crete  phonetic  gestures  are  realized  in  the  behavior  of  more  or  less  indepen¬ 
dent  articulators.  The  theory  suggests,  then,  that  an  equally  efficient 
perceptual  process  might  use  the  resulting  acoustic  pattern  to  recover  the 
discrete  gestures. 

A  result  of  coarticulation:  II.  Different  sounds,  different  contexts, 
same  percept.  That  the  phonetic  percept  is  invariant  even  when  the  relevant 
acoustic  cue  is  not  was  the  characteristic  relation  between  percept  and  sound 
that  we  took  as  an  example  in  the  first  section.  There,  we  observed  that 
variation  in  the  acoustic  pattern  results  from  overlapping  of  putatively 
invariant  gestures,  an  observation  that,  as  we  remarked,  points  to  the  ges¬ 
ture,  rather  than  the  acoustic  pattern  itself,  as  the  object  of  perception. 
We  now  add  that  the  articulatory  variation  due  to  context  is  pervasive:  in 
the  acoustic  representation  of  every  phonetic  category  yet  studied  there  are 
context-conditioned  portions  that  contribute  to  perception  and  that  must, 
therefore,  be  taken  into  account  by  theory.  Thus,  for  stops,  nasals,  frica¬ 
tives,  liquids,  semivowels,  and  vowels,  the  always  context-sensitive  transi¬ 
tions  are  cues  (Harris,  1958;  Jenkins,  Strange,  &  Edman,  1983;  Liberman  et 
al.,  1954;  O'Connor,  Gerstman,  Liberman,  Delattre,  &  Cooper,  1957:  Strange, 
Jenkins,  &  Johnson,  1983).  For  stops  and  fricatives,  the  noises  that  are  pro¬ 
duced  at  the  point  of  constriction  are  also  known  to  be  cues,  and,  under  some 
circumstances  at  least,  these,  too,  vary  with  context  (Dorman  et  al.,  1977: 
Liberman  et  al.,  1952;  Whalen,  1981). 
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An  auditory  theory  that  accounts  for  invariant  perception  in  the  face  of 
so  much  variation  in  the  signal  would  require  a  long  list  of  apparently  arbi¬ 
trary  assumptions.  For  a  motor  theory,  on  the  other  hand,  systematic  stimulus 
variation  is  not  an  obstacle  to  be  circumvented  or  overcome  in  some  arbitrary 
way;  it  is,  rather,  a  source  of  information  about  articulation  that  provides 
important  guidance  to  the  perceptual  process  in  determining  a  representation 
of  the  distal  gesture. 

A  result  of  coarticulation:  III .  Same  sound,  different  contexts,  dif¬ 
ferent  percepts .  When  phonetic  categories  share  one  feature  but  differ  in  an¬ 
other,  the  relation  between  acoustic  pattern  and  percept  speaks,  again,  to  the 
motor  theory  and  its  alternatives.  Consider,  once  more,  the  fricative  [s]  and 
the  stop  [t]  in  the  syllables  [sa]  and  [sta].  In  synthesis,  the  second-  and 
third-formant  transitions  can  be  the  same  for  these  two  categories,  since  they 
have  the  same  place  of  articulation;  and  the  first- formant  transition,  normal¬ 
ly  a  cue  to  manner,  can  be  made  ambiguous  between  them.  For  such  stimuli,  the 
perception  of  [sta]  rather  than  [sa]  depends  on  whether  there  is  an  interval 
of  silence  between  the  noise  for  the  [s]  and  the  onsets  of  the  transitions. 

Data  relevant  to  an  interpretation  of  the  role  of  silence  in  thus  produc¬ 
ing  different  percepts  from  the  same  transition  come  frcm  two  kinds  of  experi¬ 
ments.  First  are  those  that  demonstrate  the  effectiveness  of  the  transitions 
as  cues  for  the  place  feature  of  the  fricative  in  fricative-vowel  syllables 
(Harris,  1958).  The  transitions  are  not,  therefore,  masked  by  the  noise  of 
the  [s]  friction,  and  thus  the  function  of  silence  in  a  stop  is  not,  as  it 
might  be  in  an  auditory  theory,  to  protect  the  transitions  from  such  masking. 
The  second  kind  of  experiment  deals  with  the  possibility  of  a  purely  auditory 
interaction — in  this  case,  between  silence  and  the  formant  transitions.  Among 
the  findings  that  make  su...i  auditory  interaction  seem  unlikely  is  that  silence 
affects  perception  of  the  formant  transitions  differently  in  and  out  of  speech 
context  and,  further,  that  the  effectiveness  of  silence  depends  on  such  fac¬ 
tors  as  continuity  of  talker  and  prosody  (Dorman  et  al.,  1979:  Rakerd,  Decho- 
vitz,  &  Verbrugge,  1982).  But  perhaps  the  most  direct  test  for  auditory 
interaction  is  provided  by  experiments  in  which  such  interaction  is  ruled  out 
by  holding  the  acoustic  context  constant.  This  can  be  done  by  exploiting  "du¬ 
plex  perception,"  a  phenomenon  to  be  discussed  in  greater  detail  in  the  next 
section.  Here  it  is  appropriate  to  say  only  that  duplex  perception  provides  a 
way  of  presenting  acoustic  patterns  so  that,  in  a  fixed  context,  listeners 
hear  the  same  second-  or  third-formant  transitions  in  two  phenomenally  differ¬ 
ent  ways  simultaneously:  as  nonspeech  chirps  and  as  cues  for  phonetic  cate¬ 
gories.  The  finding  is  that  the  presence  or  absence  of  silence  determines 
whether  formant  transitions  appropriate  for  [t]  or  for  [p],  for  example,  are 
integrated  into  percepts  as  different  as  stops  and  fricatives;  but  silence  has 
no  effect  on  the  perception  of  the  nonspeech  chirps  that  these  same  transi¬ 
tions  produce  (Liberman,  Isenberg,  &  Rakerd,  1981).  Since  the  latter  result 
eliminates  the  possibility  of  auditory  interaction,  we  are  left  with  the  ac¬ 
count  that  the  motor  theory  would  suggest:  that  silence  acts  in  the  special¬ 
ized  phonetic  mode  to  inform  the  listener  that  the  vocal  tract  was  completely 
closed  to  produce  a  stop  consonant,  rather  than  merely  constricted  to  produce 
a  fricative.  It  follows,  then,  that  silence  will,  by  its  presence  or  absence, 
determine  whether  identical  transitions  are  cues  in  percepts  that  belong  to 
the  one  manner  or  the  other. 
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An  acoustic  signal  diverges  to  phonetic  and  auditory  modes .  We  noted 
earlier  that  a  formant  transition  is  perceptually  very  different  depending  on 
whether  it  is  perceived  in  the  auditory  mode,  where  it  sounds  like  a  chirp,  or 
in  the  phonetic  mode,  where  it  cues  a  "nonchirpy"  consonant.  Of  course,  the 
comparison  is  not  entirely  fair,  since  acoustic  context  is  not  controlled: 
the  transition  is  presented  in  isolation  in  the  one  case,  but  as  an  element  of 
a  larger  acoustic  pattern  in  the  other.  We  should,  therefore,  call  attention 
to  the  fact  that  the  same  perceptual  difference  is  obtained  even  when,  by  re¬ 
sort  to  a  special  procedure,  acoustic  context  is  held  constant  (Liberman, 
1979:  Rand,  1974).  This  procedure,  which  produces  the  duplex  percept  referred 
to  earlier,  goes  as  follows.  All  of  an  acoustic  syllable  except  only  the  for¬ 
mant  transition  that  decides  between,  for  example,  [da]  and  [ga]  is  presented 
to  one  ear.  By  itself,  this  pattern,  called  the  "base,"  sounds  like  a 
stop-vowel  syllable,  ambiguous  between  [da]  and  [ga].  To  the  other  ear  is 
presented  one  or  the  other  of  the  transitions  appropriate  for  [d]  or  [g].  In 
isolation,  these  sound  like  different  chirps.  Yet,  when  base  and  transition 
are  presented  dichotically ,  and  in  the  appropriate  temporal  relationship,  they 
give  rise  to  a  duplex  percept:  [da]  or  [ga],  depending  on  the  transition, 
and,  simultaneously,  the  appropriate  chirp.  (The  fused  syllable  appears  to  be 
in  the  ear  to  which  the  base  had  been  presented,  the  chirp  in  the  other.) 

Two  related  characteristics  of  duplex  perception  must  be  emphasized.  One 
is  that  it  is  obtained  only  when  the  stimulus  presented  to  one  ear  is,  like 
the  "chirpy"  transition,  of  short  duration  and  extremely  unspeechlike  in 
quality.  If  that  condition  is  not  met,  as,  for  example,  when  the  first  two 
formants  are  presented  to  one  ear  and  the  entire  third  formant  to  the  other, 
perception  is  not  duplex.  It  is,  on  the  contrary,  simplex;  one  hears  a  coher¬ 
ent  syllable  in  which  the  separate  components  cannot  be  apprehended.  (A  very 
different  result  is  obtained  when  two  components  of  a  musical  chord  are 
presented  to  one  ear,  a  third  component  to  the  other.  In  that  case,  listeners 
can  respond  to  the  third  component  by  itself  and  also  to  that  component  com¬ 
bined  with  the  first  two  [Pastore,  Schmuckler,  Rosenblum,  &  Szczesiul,  1983].) 

The  other,  closely  related  characteristic  of  duplex  perception  is  that  it 
is  precisely  duplex,  not  triplex.  That  is,  listeners  perceive  the  nonspeech 
chirp  and  the  fused  syllable,  but  they  do  not  also  perceive  the  base — i.e., 
the  syllable,  minus  one  of  the  formant  transitions — that  was  presented  to  one 
ear  (Repp,  Milburn,  &  Ashkenas,  1983).  (In  the  experiment  with  musical  chords 
by  Pastore  et  al.,  referred  to  just  above,  there  was  no  test  for  duplex,  as 
distinguished  from  triplex,  perception.) 

The  point  is  that  duplex  perception  does  not  simply  reflect  the  ability 
of  the  auditory  system  to  fuse  dichotically  presented  stimuli  and  also,  as  in 
the  experiment  with  the  chords,  to  keep  them  apart.  Rather,  the  duplex 
percepts  of  speech  comprise  the  only  two  ways  in  which  the  transition,  for 
example,  can  be  heard;  as  a  cue  for  a  phonetic  gesture  and  as  a  nonspeech 
sound.  These  percepts  are  strikingly  different,  and,  as  we  have  already  seen, 
they  change  in  different,  sometimes  contrasting  ways  in  response  to  variations 
in  the  acoustic  signals — variations  that  must  have  been  available  to  all 
structures  in  the  brain  that  can  process  auditory  information.  A  reasonable 
conclusion  is  that  there  must  be  two  modules  that  can  somehow  use  the  same  in¬ 
put  to  produce  simultaneous  representations  of  two  distal  objects.  (For 
speculation  about  the  mechanism  that  normally  prevents  perception  of  this  eco¬ 
logically  impossible  situation,  and  about  the  reason  why  that  highly  adaptive 
mechanism  might  be  defeated  by  the  procedures  used  to  produce  duplex  percep¬ 
tion,  see  Mattingly  i  Liberman,  1985.) 
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Acoustic  and  optical  signals  converge  on  the  phonetic  mode.  Ir  duplex 
perception,  a  single  acoustic  stimulus  is  processed  simultaneously  by  the 
phonetic  and  auditory  modules  to  produce  perception  of  two  distal  objects:  a 
phonetic  gesture  and  a  sound.  In  the  phenomenon  to  which  we  turn  now,  some¬ 
thing  like  the  opposite  occurs:  two  different  stimuli — one  acoustic,  the  oth¬ 
er  optical--are  combined  by  the  phonetic  module  to  produce  coherent  perception 
of  a  single  distal  event.  This  phenomenon,  discovered  by  McGurk  and  McDonald 
(1976),  can  be  illustrated  by  this  variant  on  their  original  demonstration. 
Subjects  are  presented  acoustically  with  the  syllables  [ba],  [ba],  [ba]  and 

optically  with  a  face  that,  in  approximate  synchrony,  silently  articulates 

[be],  [ve],  [3e].  The  resulting  and  compelling  percept  is  [ba],  [va],  [3a], 

with  no  awareness  that  it  is  in  any  sense  bimodal — that  is,  part  auditory  and 

part  visual.  According  to  the  motor  theory,  this  is  so  because  the  perceived 
event  is  neither;  it  is,  rather,  a  gesture.  The  proximal  acoustic  signal  and 
the  proximal  optical  signal  have  in  common,  then,  that  they  convey  information 
about  the  same  distal  object.  (Perhaps  a  similar  convergence  is  implied  by 
the  finding  that  units  in  the  optic  tectum  of  the  barn  owl  are  bimodally  sen¬ 
sitive  to  acoustic  and  optical  cues  for  the  same  distal  property,  location  in 
space;  Knudsen,  1982). 

Even  prelingu istic  infants  seem  to  have  some  appreciation  of  the  relation 
between  the  acoustic  and  optical  consequences  of  phonetic  articulation.  This 
is  to  be  inferred  from  an  experiment  in  which  it  was  found  that  infants  at 
four  to  five  months  of  age  preferred  to  look  at  a  face  that  articulated  the 
vowel  they  were  hearing  rather  than  at  the  same  face  articulating  a  different 
vowel  (Kuhl  &  Meltzoff,  1982).  Significantly,  this  result  was  not  obtained 
when  the  sounds  were  pure  tones  matched  in  amplitude  and  duration  to  the  vow¬ 
els.  In  a  related  study  it  was  found  that  infants  of  a  similar  age  looked 
longer  at  a  face  repeating  the  disyllable  they  were  hearing  than  at  the  same 
face  repeating  another  disyllable,  though  both  disyllables  were  carefully  syn¬ 
chronized  with  the  visible  articulation  (MacKain,  Studdert -Kennedy ,  Spieker,  & 
Stern,  1983).  Like  the  results  obtained  with  adults  in  the  McGurk-MacDonald 
kind  of  experiment,  these  findings  with  infants  imply  a  perception-production 
link  and,  accordingly,  a  common  mode  of  perception  for  all  proper  information 
about  the  gesture. 

The  general  characteristics  that  cause  acoustic  signals  to  be  perceived 
as  speech.  The  point  was  made  in  an  earlier  section  that  acoustic  definitions 
of  phonetic  contrasts  are,  in  the  end,  unsatisfactory.  Now  we  would  suggest 
that  acoustic  definitions  also  fall  for  the  purpose  of  distinguishing  in  gen¬ 
eral  between  acoustic  patterns  that  convey  phonetic  structures  and  those  that 
do  not.  Thus,  speech  cannot  be  distinguished  from  nonspeech  by  appeal  to  sur¬ 
face  properties  of  the  sound.  Surely,  natural  speech  does  have  certain 
characteristics  of  a  general  and  superficial  sort — for  example,  formants  with 
characteristic  bandwidths  and  relative  intensities,  stretches  of  waveform 
periodicities  that  typically  mark  the  voiced  portion  of  syllables,  peaks  of 
intensity  corresponding  approximately  to  syllabic  rhythm,  etc. — and  these  can 
be  used  by  machines  to  detect  speech.  But  research  with  synthesizers  has 
shown  that  speech  is  perceived  even  when  such  general  characteristics  are  ab¬ 
sent.  This  was  certainly  true  in  the  case  of  many  of  the  acoustic  patterns 
that  were  used  in  work  vflth  the  Pattern  Playback  synthesizer,  and  more  recent¬ 
ly  it  has  been  shown  to  be  true  in  the  most  extreme  case  of  patterns  consist¬ 
ing  only  of  sine  waves  that  follow  natural  formant  trajectories  (Remez,  Rubin, 
Pisoni,  it  Carrell,  1981).  Significantly,  the  converse  effect  is  also  ob¬ 
tained.  When  reasonably  normal  formants  are  made  to  deviate  into  acoustically 
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continuous  but  abnormal  trajectories,  the  percept  breaks  into  two  categorical¬ 
ly  distinct  parts:  speech  and  a  background  of  chirps,  glissandi,  and  assorted 
noises  (Liberman  &  Studdert-Kennedy ,  1978).  Of  course,  the  trajectories  of 

the  formants  are  determined  by  the  movements  of  the  articulators.  Evidently, 
those  trajectories  that  conform  to  possible  articulations  engage  the  phonetic 
module;  all  others  fail. 

We  conclude  that  acoustic  patterns  are  identified  as  speech  by  reference 
to  deep  properties  of  a  linguistic  sort:  if  a  sound  can  be  "interpreted"  by 
the  specialized  phonetic  module  as  the  result  of  linguistically  significant 
gestures,  then  it  is  speech;  otherwise,  not.  (In  much  the  same  way,  grammati¬ 
cal  sentences  can  be  distinguished  from  ungrammatical  ones,  not  by  lists  of 
surface  properties,  but  only  by  determining  whether  or  not  a  grammatical 
derivation  can  be  given.)  Of  course,  the  kind  of  mechanism  such  an  "interpre¬ 
tation"  requires  is  the  kind  of  mechanism  the  motor  theory  presumes. 

Phonetic  and  auditory  responses  to  the  cues .  Obviously,  a  module  that 
acts  on  acoustic  signals  cannot  respond  beyond  the  physiological  limits  of 
those  parts  of  the  auditory  system  that  transmit  the  signal  to  the  module. 

Within  those  limits,  however,  different  modules  can  be  sensitive  to  the  sig¬ 
nals  in  different  ways.  Thus,  the  auditory-localization  module  enables 
listeners  to  perceive  differences  in  the  position  of  sounding  objects  given 
temporal  disparity  cues  smaller  by  several  orders  of  magnitude  than  those  re¬ 
quired  to  make  the  listener  aware  of  temporal  disparity  as  such  (Brown  & 
Deffenbacher ,  1979,  Chap.  7;  Hirsh,  1959).  If  there  is,  as  the  motor  theory 

implies,  a  distinct  phonetic  module,  then  in  like  manner  its  sensitivities 
should  not,  except  by  accident,  be  the  same  as  those  that  characterize  the 

module  that  deals  with  the  sounds  of  nonspeech  events. 

In  this  connection,  we  noted  in  the  first  section  of  the  paper  that  one 
form  of  auditory  theory  of  speech  perception  points  to  auditory 
discontinuities  in  differential  sensitivity  (or  in  absolute  identification), 
taking  these  to  be  the  natural  bases  for  the  perceptual  discontinuities  that 
characterize  the  boundaries  of  phonetic  categories.  But  several  kinds  of 
experiments  strongly  imply  that  this  is  not  so. 

One  kind  of  experiment  has  provided  evidence  that  the  perceptual 

discontinuities  at  the  boundaries  of  phonetic  categories  are  not  fixed;  rath¬ 
er,  they  move  in  accordance  with  the  acoustic  consequences  of  articulatory 
adjustnents  associated  with  phonetic  context,  dialect,  and  rate  of  speech. 
(For  a  review,  see  Repp  &  Liberman,  in  press.)  To  account  for  such  articula¬ 
tion-correlated  changes  in  perceptual  sensitivities  by  appeal  to  auditory  pro¬ 
cesses  requires,  yet  again,  an  ultimately  countless  set  of  ad  hoc  assumptions 
about  auditory  interactions,  as  well  as  the  implausible  assumption  that  the 
articulators  are  always  able  to  behave  so  as  to  produce  just  those  sounds  that 
conform  to  the  manifold  and  complex  requirements  that  the  auditory  interac¬ 
tions  impose.  It  seems  hardly  more  plausible  that,  as  has  been  suggested,  the 
discontinuities  in  phonetic  perception  are  really  auditory  discontinuities 
that  were  caused  to  move  about  in  phylogenetic  or  ontogenetic  development  as  a 
result  of  experience  with  speech  (Aslin  &  Pisoni,  1980).  The  difficulty  with 
this  assumption  is  that  it  presupposes  the  very  canonical  form  of  the  cues 
that  does  not  exist  (see  above)  and,  also,  that  it  implies  a  contradiction  in 
assuming,  as  it  must,  that  the  auditory  sensitivities  underwent  changes  in  the 
development  of  speech,  yet  somehow  also  remained  unchanged  and  nonetheless 
manifest  in  the  adult’s  perception  of  nonspeech  sounds. 
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Perhaps  this  is  the  place  to  remark  about  categorical  perception  that  the 
issue  is  not,  as  is  often  supposed,  whether  nonspeech  continua  are  categori¬ 
cally  perceived,  for  surely  some  do  show  tendencies  in  that  direction.  The 
issue  is  whether,  given  the  same  (or  similar)  acoustic  continua,  the  auditory 
and  phonetic  boundaries  are  in  the  same  place.  If  there  are,  indeed,  auditory 
boundaries,  and  if,  further,  these  boundaries  are  replaced  in  phonetic  percep¬ 
tion  by  boundaries  at  different  locations  (as  the  experiments  referred  to 
above  do  indicate),  then  the  separateness  of  phonetic  and  auditory  perception 
is  even  more  strongly  argued  for  than  if  the  phonetic  boundaries  had  appeared 
on  continua  where  auditory  boundaries  did  not  also  exist. 

Also  relevant  to  comparison  of  sensitivity  in  phonetic  and  auditory  modes 
are  experiments  on  perception  of  acoustic  variations  when,  in  the  one  case, 
they  are  cues  for  phonetic  distinctions,  and  when,  in  some  other,  they  are 
perceived  as  nonspeech.  One  of  the  earliest  of  the  experiments  to  provide  da¬ 
ta  about  the  nonspeech  side  of  this  comparison  dealt  with  perception  of  fre¬ 
quency-modulated  tones — or  "ramps”  as  they  were  called — that  bear  a  close 
resemblance  to  the  formant  transitions.  The  finding  was  that  listeners  are 
considerably  better  at  perceiving  the  pitch  at  the  end  of  the  ramp  than  at  the 
beginning  (Brady,  House,  &  Stevens,  1961).  Yet,  in  the  case  of  stop  conso¬ 
nants  that  are  cued  by  formant  transitions,  perception  is  better  sy llable-ini- 
tially  than  syllable-f inally ,  though  in  the  former  case  it  requires  informa¬ 
tion  about  the  beginning  of  the  ramp,  while  in  the  latter  it  needs  to  know 
about  the  end.  Thus,  if  one  were  predicting  sensitivity  to  speech  from 
sensitivity  to  the  analogous  nonspeech  sounds,  one  would  make  exactly  the 
wrong  predictions.  More  recent  studies  have  made  more  direct  comparisons  and 
found  differences  in  discrimination  functions  when,  in  speech  context,  formant 
transitions  cued  place  distinctions  among  stops  and  liquids,  and  when,  in  Iso¬ 
lation,  the  same  transitions  were  perceived  as  nonspeech  sounds  (Mattingly  et 
al.,  1971;  Miyawaki,  Strange,  Verbrugge,  Liberman,  Jenkins,  &  Fujimura,  1975). 

More  impressive,  perhaps,  is  evidence  that  has  come  from  experiments  in 
which  listeners  are  induced  to  perceive  a  constant  stimulus  in  different  ways. 
Here  belong  experiments  in  which  sine-wave  analogues  of  speech,  referred  to 
earlier,  are  presented  under  conditions  that  cause  some  listeners  to  perceive 
them  as  speech  and  others  not.  The  perceived  discontinuities  lie  at  different 
places  (on  the  acoustic  continuum)  for  the  two  groups  (Best  et  al.,  1981;  Best 
&  Studdert -Kennedy ,  1983;  Studdert -Kennedy  &  Williams,  1984;  Williams, 

Verbrugge,  &  Studdert-Kennedy ,  1983).  Here,  too,  belongs  an  experiment  in 
which  the  formant-transitions  appropriate  to  a  place  contrast  between  stop 
consonants  are  presented  with  the  remainder  of  a  syllable  in  such  a  way  as  to 
produce  the  duplex  percept  referred  to  earlier:  the  transitions  cue  a  stop 
consonant  and,  simultaneously,  nonspeech  chirps.  The  result  is  that  listeners 
yield  quite  different  discrimination  functions  for  exactly  the  same  formant 
transitions  in  exactly  the  same  acoustic  context,  depending  on  whether  they 
are  responding  to  the  speech  or  nonspeech  sides  of  the  duplex  percept;  only  on 
the  speech  side  of  the  percept  is  there  a  peak  in  the  discrimination  function 
to  mark  a  perceptual  discontinuity  at  the  phonetic  boundary  (Mann  &  Liberman, 
1983). 

Finally,  we  note  that,  apart  from  differences  in  differential  sensitivity 
to  the  transitions,  there  is  also  a  difference  in  absolute-threshold 
sensitivity  when,  in  the  one  case,  these  transitions  support  a  phonetic  per¬ 
cept,  and  when,  in  the  other,  they  are  perceived  as  nonspeech  chirps. 
Exploiting,  again,  the  phenomenon  of  duplex  perception,  investigators  found 
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that  the  transitions  were  effective  (on  the  speech  side  of  the  percept)  in 
cueing  the  contrast  between  stops  at  a  level  of  intensity  18  db  lower  than 
that  required  for  comparable  discrimination  of  the  chirps  (Bentin  &  Mann, 
1983).  At  that  level,  indeed,  listeners  could  not  even  hear  the  chirps,  let 
alone  discriminate  them;  yet  they  could  still  use  the  transitions  to  identify 
the  several  stops. 


The  Several  Aspects  of  the  Theory 

For  the  purpose  of  evaluating  the  motor  theory,  it  is  important  to  sepa¬ 
rate  it  into  its  more  or  less  independent  parts.  First,  and  fundamentally, 
there  is  the  claim  that  phonetic  perception  is  perception  of  gesture.  As  we 
have  seen,  this  claim  is  based  on  evidence  that  the  invariant  source  of  the 
phonetic  percept  is  somewhere  in  the  processes  by  which  the  sounds  of  speech 
are  produced.  In  the  first  part  of  this  section  we  will  consider  where  in 
those  processes  the  invariant  might  be  found. 

The  motor  theory  also  implies  a  tight  link  between  perception  and  produc¬ 
tion.  In  the  second  part  of  this  section  we  will  ask  how  that  link  came  to 
be. 


Where  is  the  Invariant  Phonetic  Gesture?  A  phonetic  gesture,  as  we  have 
construed  it,  is  a  class  of  movements  by  one  or  more  articulators  that  results 
in  a  particular,  linguistically  significant  deformation,  over  time,  of  the  vo¬ 
cal-tract  configuration.  The  linguistic  function  of  the  gesture  is  clear 
enough:  phonetic  contrasts,  which  are  of  course  the  basis  of  phonological 
categories,  depend  on  the  choice  of  one  particular  gesture  rather  than  anoth¬ 
er.  What  is  not  so  clear  is  how  the  gesture  relates  to  the  actual  physical 
movements  of  articulators  and  to  the  resulting  vocal-tract  configurations,  ob¬ 
served,  for  example,  in  x-ray  films. 

In  the  early  days  of  the  motor  theory,  we  made  a  simplifying  assumption 
about  this  relation:  that  a  gesture  was  effected  by  a  single  key  articulator. 
On  this  assumption,  the  actual  movement  trajectory  of  the  articulator  might 
vary,  but  only  because  of  aerodynamic  factors  and  the  physical  linkage  of  this 
articulator  with  others,  so  the  neural  commands  in  the  final  ccnimon  paths 
(observable  with  electromyographic  techniques)  would  nevertheless  be  invariant 
across  different  contexts.  This  assumption  was  appropriate  as  an  initial 
working  hypothesis,  if  only  because  it  was  directly  testable.  In  the  event, 
there  proved  be  a  considerable  amount  of  variability  that  the  hypothesis  could 
not  account  for. 

In  formulating  this  initial  hypothesis,  we  had  overlooked  several  serious 
complications.  One  is  that  a  particular  gesture  typically  involves  not  just 
one  articulator,  but  two  or  more;  thus  "lip  rounding,"  for  example,  is  a 
collaboration  of  lower  lip,  upper  lip,  and  jaw.  Another  is  that  a  single 
articulator  may  participate  in  the  execution  of  two  different  gestures  at  the 
same  time;  thus,  the  lips  may  be  simultaneously  rounding  and  closing  in  the 
production  of  a  labial  stop  followed  by  a  rounded  vowel,  e.g.,  [bu].  Prosody 
makes  additional  complicating  demands,  as  when  a  greater  displacement  of  some 
or  all  of  the  active  articulators  is  required  in  producing  a  stressed  syllable 
rather  than  an  unstressed  one;  and  linguistically  irrelevant  factors,  notably 
speaking  rate,  affect  the  trajectory  and  phasing  of  the  component  movements. 
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These  complications  might  suggest  that  there  is  little  hope  of  providing 
a  rigorous  physical  definition  of  a  particular  gesture,  and  that  the  gestures 
are  hardly  more  satisfactory  as  perceptual  primitives  than  are  the  acoustic 
cues.  It  might,  indeed,  be  argued  that  there  is  an  infinite  number  of  possi¬ 
ble  articulatory  movements,  and  that  the  basis  for  categorizing  one  group  of 
such  movements  as  "lip  rounding"  and  another  as  "lip  closure"  is  entirely  a 
priori . 

But  the  case  for  the  gesture  is  by  no  means  as  weak  as  this.  Though  we 
have  a  great  deal  to  learn  before  we  can  account  for  the  variation  in  in¬ 
stances  of  the  same  gesture,  it  is  nonetheless  clear  that,  despite  such  varia¬ 
tion,  the  gestures  have  a  virtue  that  the  acoustic  cues  lack:  instances  of  a 
particular  gesture  always  have  certain  topological  properties  not  shared  by 
any  other  gesture.  That  is,  for  any  particular  gesture,  the  same  sort  of  dis¬ 
tinctive  deformation  is  imposed  on  the  current  vocal-tract  configuration, 
whatever  this  "underlying"  configuration  happens  to  be.  Thus,  in  lip  round¬ 
ing,  the  lips  are  always  slowly  protruded  and  approximated  to  some  appreciable 
extent,  so  that  the  anterior  end  of  the  vocal  tract  is  extended  and  narrowed, 
though  the  relative  contributions  of  the  tongue  and  lips,  the  actual  degrees 
of  protrusion  and  approximation,  and  the  speed  of  articulatory  movement  vary 
according  to  context.  Perhaps  this  example  seems  obvious  because  lip  rounding 
involves  a  local  deformation  of  the  vocal-tract  configuration,  but  the  gener¬ 
alization  also  applies  to  more  global  gestures.  Consider,  for  example,  the 
gesture  required  to  produce  an  "open"  vowel.  In  this  gesture,  tongue,  lips, 
jaw,  and  hyoid  all  participate  to  contextually  varying  degrees,  and  the  actual 
distance  between  the  two  Ups,  as  well  as  that  between  the  tongue  blade  and 
body  and  the  upper  surfaces  of  the  vocal  tract,  are  variable;  but  the  goal  is 
always  to  give  the  tract  a  more  open,  horn-shaped  configuration  than  it  would 
otherwise  have  had. 

We  have  pointed  out  repeatedly  that,  as  a  consequence  of  gestural 
overlapping,  the  Invariant  properties  of  a  particular  gesture  are  not  manifest 
in  the  spectrum  of  the  speech  signal.  We  would  now  caution  that  a  further 
consequence  of  this  overlapping  is  that,  because  of  their  essentially  topolog¬ 
ical  character,  the  gestural  invariants  are  usually  not  obvious  from  inspec¬ 
tion  of  a  single  static  vocal-tract  configuration,  either.  They  emerge  only 
from  consideration  of  the  configuration  as  it  changes  over  time,  and  from 
comparison  with  other  configurations  in  which  the  same  gesture  occurs  in  dif¬ 
ferent  contexts,  or  different  gestures  in  the  same  context. 

We  would  argue,  then,  that  the  gestures  do  have  characteristic  invariant 
properties,  as  the  motor  theory  requires,  though  these  must  be  seen,  not  as 
peripheral  movements,  but  as  the  more  remote  structures  that  control  the  move¬ 
ments.  These  structures  correspond  to  the  speaker's  intentions.  What  is  far 
from  being  understood  is  the  nature  of  the  system  that  computes  the  topologi¬ 
cally  appropriate  version  of  a  gesture  in  a  particular  context.  But  this 
problem  is  not  peculiar  to  the  motor  theory;  it  is  familiar  to  many  who  study 
the  control  and  coordination  of  movement,  for  they,  like  us,  must  consider 
whether,  given  context-conditioned  variability  at  the  surface,  motor  acts  are 
nevertheless  governed  by  Invariants  of  some  sort  (Browman  &  Goldstein,  1985; 
Fowler,  Rubin,  Remez,  &  Turvey,  1980;  Tuller  &  Kelso,  198^1;  Turvey,  1977). 

The  Origin  of  the  Perception-Production  Link.  In  the  earliest  accounts 
of  the  motor  theory,  we  put  considerable  attention  on  the  fact  that  listeners 
not  only  perceive  the  speech  signal  but  also  produce  it.  This,  together  with 
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doctrinal  behaviorist  considerations,  led  us  to  assume  that  the  connection  be¬ 
tween  perception  and  production  was  formed  as  a  wholly  learned  association, 
and  that  perceiving  the  gesture  was  a  matter  of  picking  up  the  sensory  conse¬ 
quences  of  covert  mimicry.  On  this  view  of  the  genesis  of  the  percep¬ 
tion-production  link,  the  distinguishing  characteristic  of  speech  is  only  that 
it  provides  the  opportunity  for  the  link  to  be  established.  Otherwise,  ordi¬ 
nary  principles  of  associative  learning  are  adequate  to  the  task;  no  speciali¬ 
zation  for  language  is  required. 

But  then  such  phenomena  as  have  been  described  in  this  paper  were  discov¬ 
ered,  and  it  became  apparent  that  they  differed  from  anything  that  association 
learning  could  reasonably  be  expected  to  produce.  Nor  were  these  the  only 
relevant  considerations.  Thus,  we  learned  that  people  who  have  been  patholog¬ 
ically  incapable  from  birth  of  controlling  their  articulators  are  nonetheless 
able  to  perceive  speech  (MacNeilage,  Bootes,  &  Chase,  1967).  From  the  re¬ 
search  pioneered  by  Eimas,  Siqueland,  Jusczyk,  and  Vigorito  (1971),  we  also 
learned  that  prelingulstic  infants  apparently  categorize  phonetic  distinctions 
much  as  adults  do.  More  recently,  we  have  seen  that  even  when  the  distinction 
is  not  functional  in  the  native  language  of  the  subjects,  and  when,  according¬ 
ly,  adults  have  trouble  perceiving  it,  infants  nevertheless  do  quite  well  up 
to  about  one  year  of  age,  at  which  time  they  begin  to  perform  as  poorly  as 
adults  (Worker  &  Tees,  1984).  Perhaps,  then,  the  sensitivity  of  infants  to 
the  acoustic  consequences  of  linguistic  gestures  includes  all  those  gestures 
that  could  be  phonetically  significant  in  any  language,  acquisition  of  one's 
native  language  being  a  process  of  losing  sensitivity  to  gestures  it  does  not 
use.  Taking  such  further  considerations  as  these  into  account,  we  have  become 
even  more  strongly  persuaded  that  the  phonetic  mode,  and  the  percep¬ 
tion-production  link  it  incorporates,  are  innately  specified. 

Seen,  then,  as  a  view  about  the  biology  of  language,  rather  than  a  com¬ 
ment  on  the  coincidence  of  speaking  and  listening,  the  motor  theory  bears  at 
several  points  on  our  thinking  about  the  development  of  speech  perception  in 
the  child.  Consider,  first,  a  linguistic  ability  that,  though  seldom  noted 
(but  see  Mattingly,  1976),  must  be  taken  as  an  Important  prerequisite  to 
acquiring  the  phonology  of  a  language.  This  is  the  ability  to  sort  acoustic 
patterns  into  two  classes:  those  that  contain  (candidate)  phonetic  structures 
and  those  that  do  not.  (For  evidence,  however  indirect,  that  infants  do  so 
sort,  see  Alegria  &  Noirot,  1982;  Best,  Hoffman,  &  Glanville,  1982;  Entus, 
1977:  Molfese,  Freeman,  &  Palermo,  1975;  Segalowitz  &  Chapman,  1980;  Witelson, 
1977;  but  see  Vargha-Khadem  &  Corballis,  1979).  To  appreciate  the  bearing  of 
the  motor  theory  on  this  matter,  recall  our  claim,  made  in  an  earlier  section, 
that  phonetic  objects  cannot  be  perceived  as  a  class  by  reference  to  acoustic 
stigmata,  but  only  by  a  recognition  that  the  sounds  might  have  been  produced 
by  a  vocal  tract  as  it  made  linguistically  significant  gestures.  If  so,  the 
perception-production  link  is  a  necessary  condition  for  recognizing  speech  as 
speech.  It  would  thus  be  a  blow  to  the  motor  theory  if  it  could  be  shown  that 
infants  must  develop  empirical  criteria  for  this  purpose.  Fortunately  for  the 
theory,  such  criteria  appear  to  be  unnecessary. 

Consider,  too,  how  the  child  comes  to  know,  not  only  that  phonetic  struc¬ 
tures  are  present,  but,  more  specifically.  Just  what  those  phonetic  structures 
are.  In  this  connection,  recall  that  information  about  the  string  of  phonetic 
segments  is  overlapped  in  the  sound,  and  that  there  are,  accordingly,  no 
acoustic  boundaries.  Until  and  unless  the  child  (tacitly)  appreciates  the 
gestural  source  of  the  sounds,  it  can  hardly  be  expected  to  perceive,  or  ever 
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learn  to  perceive,  a  phonetic  structure.  Recall,  too,  that  the  acoustic  cues 
for  a  phonetic  category  vary  with  phonetic  factors  such  as  context  and  with 
extra-phonetic  factors  such  as  rate  and  vocal-tract  size.  This  is  to  say, 
once  again,  that  there  is  no  canonical  cue.  What,  then,  is  the  child  to 
learn?  Association  of  some  particular  cue  (or  set  of  cues)  with  a  phonetic 
category  will  work  only  for  a  particular  circumstance.  When  circumstances 
change,  the  child's  identification  of  the  category  will  be  wrong,  sometimes 
grossly,  and  it  is  hard  to  see  how  it  could  readily  make  the  appropriate 
correction.  Perception  of  the  phonetic  categories  can  properly  be  generalized 
only  if  the  acoustic  patterns  are  taken  for  what  they  really  are:  information 
about  the  underlying  gestures.  No  matter  that  the  child  sometimes  mistakes 
the  phonological  significance  of  the  gesture,  so  long  as  that  which  is  per¬ 
ceived  captures  the  systematic  nature  of  its  relation  to  the  sound;  the 
phonology  will  come  in  due  course.  To  appreciate  this  relation  is,  once 
again,  to  make  use  of  the  link  between  perception  and  production. 

How  "Direct"  is  Speech  Perception? 


Since  we  have  been  arguing  that  speech  perception  is  accomplished  without 
cognitive  translation  from  a  first-stage  auditory  register,  our  position  might 
appear  similar  to  the  one  Gibson  (1966)  has  taken  in  regard  to  "direct  percep¬ 
tion."  The  similarity  to  Gibson's  views  may  seem  all  the  greater  because,  like 

him,  we  believe  that  the  object  of  perception  is  motoric.  But  there  are  im¬ 
portant  differences,  the  bases  for  which  are  to  be  seen  in  the  following 

passage  (Gibson,  1966,  p.  94): 

An  articulated  utterance  is  a  source  of  a  vibratory  field  in  the 
air.  The  source  is  biologically  'physical'  and  the  vibration  is 
acoustically  'physical'.  The  vibration  is  a  potential  stimulus, 
becoming  effective  when  a  listener  is  within  range  of  the  vibratory 
field.  The  listener  then  perceives  the  articulation  because  the 
invariants  of  vibration  correspond  to  those  of  articulation.  In 
this  theory  of  speech  perception,  the  units  and  parts  of  speech  are 
present  both  in  the  mouth  of  the  speaker  and  in  the  air  between  the 
speaker  and  listener.  Phonemes  are  in  the  air.  They  can  be  consid¬ 
ered  physically  real  if  the  higher-order  invariants  of  sound  waves 
are  admitted  to  the  realm  of  physics. 

The  first  difference  between  Gibson's  view  and  ours  relates  to  the  nature 
of  the  perceived  events.  For  Gibson,  these  are  actual  movements  of  the 
articulators,  while  for  us,  they  are  the  more  remote  gestures  that  the  speaker 
Intended.  The  distinction  would  be  trivial  if  an  articulator  were  affected  by 
only  one  gesture  at  a  time,  but,  as  we  have  several  times  remarked,  an  articu¬ 
latory  movement  is  usually  the  result  of  two  or  more  overlapping  gestures. 
The  gestures  are  thus  control  structures  for  the  observable  movements. 

The  second  difference  is  that,  unlike  Gibson,  we  do  not  think  articulato¬ 
ry  movements  (let  alone  phonetic  structures)  are  given  directly  (that  is, 
without  computation)  by  "higher-order  invariants"  that  would  be  plain  if  only 
we  had  a  biologically  appropriate  science  of  physical  acoustics.  We  would 
certainly  welcome  any  demonstration  that  such  invariants  did  exist,  since, 
even  though  articulatory  movement  is  not  equivalent  to  phonetic  structure, 
such  a  demonstration  would  permit  a  simpler  account  of  how  the  phonetic  module 
works.  But  no  higher-order  invariants  have  thus  far  been  proposed,  and  we 
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doubt  that  any  will  be  forthcoming.  We  would  be  more  optimistic  on  this  score 
if  it  could  be  shown,  at  least,  that  articulatory  movements  can  be  recovered 
from  the  signal  by  computations  that  are  purely  analytic,  if  nevertheless  com¬ 
plex.  One  might  then  hope  to  reformulate  the  relationship  between  movements 
and  signal  in  a  way  that  would  make  it  possible  to  appeal  to  higher-order 
invariants  and  thus  obviate  the  need  for  computation.  But,  given  the 
many-to-one  relation  between  vocal-tract  configurations  and  acoustic  signal,  a 
purely  analytic  solution  to  the  problem  of  recovering  movements  from  the  sig¬ 
nal  seems  to  be  impossible  unless  one  makes  unrealistic  assumptions  about 
excitation,  damping,  and  other  physical  variables  (Sondhi,  1979).  We  there¬ 
fore  remain  skeptical  about  higher-order  invariants. 

The  alternative  to  an  analytic  account  of  speech  perception  is,  of 
course,  a  synthetic  one,  in  which  case  the  module  compares  some  parametric 
description  of  the  input  signal  with  candidate  signal  descriptions.  As  with 
any  form  of  "analysis-by-synthesis"  (cf.  Stevens  &  Halle,  1967),  such  an  ac¬ 
count  is  plausible  only  if  the  number  of  candidates  the  module  has  to  test  can 
be  kept  within  reasonable  bounds.  This  requirement  is  met,  however,  if,  as  we 
suppose,  the  candidate  signal  descriptions  are  computed  by  an  analog  of  the 
production  process — an  internal,  innately  specified  vocal-tract  synthesizer, 
as  it  were  (Liberman,  Mattingly,  &  Turvey,  1972;  Mattingly  &  Liberman, 
1969) — that  incorporates  complete  information  about  the  anatomical  and  physio¬ 
logical  characteristics  of  the  vocal  tract  and  also  about  the  articulatory  and 
acoustic  consequences  of  linguistically  significant  gestures.  Further  con¬ 
straints  become  available  as  experience  with  the  phonology  of  a  particular 
language  reduces  the  Inventory  of  possible  gestures  and  provides  information 
about  the  phonotactic  and  temporal  restrictions  on  their  occurrence.  The  mod¬ 
ule  has  then  merely  to  determine  which  (if  any)  of  the  small  number  of  ges¬ 
tures  that  might  have  been  initiated  at  a  particular  instant  could,  in  combi¬ 
nation  with  gestures  already  in  progress,  account  for  the  signal. 

Thus,  we  would  claim  that  the  processes  of  speech  perception  are,  like 
other  linguistic  processes,  inherently  computational  and  quite  indirect.  If 
perception  seems  nonetheless  Immediate,  it  is  not  because  the  process  is  in 
fact  straightforward,  but  because  the  module  is  so  well-adapted  to  its  complex 
task. 

The  Motor  Theory  and  Modularity 

In  attributing  speech  perception  to  a  "module,"  we  have  in  mind  the  no¬ 
tion  of  modularity  proposed  by  Fodor  (1983).  A  module,  for  Fodor,  is  a  piece 
of  neural  architecture  that  performs  the  special  computations  required  to  pro¬ 
vide  central  cognitive  processes  with  representations  of  objects  or  events  be¬ 
longing  to  a  natural  class  that  is  ecologically  significant  for  the  organism. 
This  class,  the  "domain"  of  the  module,  is  apt  also  to  be  "eccentric,"  for  the 
domain  would  be  otherwise  merely  a  province  of  some  more  general  domain,  for 
which  another  module  must  be  postulated  anyway.  Besides  domain-specificity 
i  id  specialized  neural  architecture,  a  module  has  other  characteristic  proper¬ 
ties.  Because  the  perceptual  process  it  controls  is  not  cognitive,  there  is 
little  or  no  possibility  of  awareness  of  whatever  computations  are  carried  on 
within  the  module  ("limited  central  access").  Because  the  module  is  special¬ 
ized,  it  has  a  "shallow"  output,  consisting  only  of  rigidly  definable,  do¬ 
main-relevant  representations:  accordingly.  It  processes  only  the  domain-rele¬ 
vant  information  in  the  input  stimulus.  Its  computations  are  thus  much  faster 
than  those  of  the  less  specialized  processes  of  central  cognition.  Because  of 
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the  ecological  importance  of  its  domain  for  the  organism,  the  operation  of  the 
module  is  not  a  matter  of  choice,  but  "mandatory";  for  the  same  reason,  its 
computations  are  "informationally  encapsulated,"  that  is,  protected  from  cog¬ 
nitive  bias. 

Most  psychologists  would  agree  that  auditory  localization,  to  return  to 
an  example  we  have  mentioned  several  times,  is  controlled  by  specialized  pro¬ 
cesses  of  some  noncognitive  kind.  They  might  also  agree  that  its  properties 
are  those  that  Fodor  assigns  to  modules.  At  all  events,  they  would  set  audi¬ 
tory  localization  apart  from  such  obviously  cognitive  activities  as  playing 
chess,  proving  theorems,  and  recognizing  a  particular  chair  as  a  token  of  the 
type  called  "chair."  As  for  perception  of  language,  the  consensus  is  that  it 
qualifies  as  a  cognitive  process  par  excellence,  modular  only  in  that  it  is 
supported  by  the  mechanisms  of  the  auditory  modality.  But  in  this,  we  and 
Fodor  would  argue,  the  consensus  is  doubly  mistaken:  the  perception  of  lan¬ 
guage  is  neither  cognitive  nor  auditory.  The  events  that  constitute  the  do¬ 
main  of  linguistic  perception,  however  they  may  be  defined,  must  certainly  be 
an  ecologically  significant  natural  class,  and  it  has  been  recognized  since 
Broca  that  linguistic  perception  is  associated  with  specialized  neural 
architecture.  Evidently,  linguistic  perception  is  fast  and  mandatory;  argu¬ 
ably,  it  is  informationally  encapsulated — that  is,  its  phonetic,  morphological 
and  syntactic  analyses  are  not  biassed  by  knowledge  of  the  world — and  its  out¬ 
put  is  shallow — that  is,  it  produces  a  linguistic  description  of  the  utter¬ 
ance,  and  only  this.  These  and  other  considerations  suggest  that,  like  audi¬ 
tory  localization,  perception  of  language  rests  on  a  specialization  of  the 
kind  that  Fodor  calls  a  module. 

The  data  that  have  led  us  in  the  past  to  claim  that  "speech  is  special" 
and  to  postulate  a  "speech  mode"  of  perception  can  now  be  seen  to  be  consist¬ 
ent  with  Fodor 's  claims  about  modularity,  and  especially  about  the  modularity 
of  language.  (What  we  have  been  calling  a  phonetic  module  is  then  more  prop¬ 
erly  called  a  linguistic  module.)  Thus,  as  we  have  noted,  speech  perception 
uses  all  the  Information  in  the  stimulus  that  is  relevant  to  phonetic  struc¬ 
tures:  every  potential  cue  proves  to  be  an  actual  cue.  This  holds  true  even 
across  modalities:  relevant  optical  information  combines  with  relevant  acous¬ 
tic  Information  to  produce  a  coherent  phonetic  percept  in  which,  as  in  the 
example  described  earlier,  the  bimodal  nature  of  the  stimulation  is  not 
detectable.  In  contrast,  irrelevant  information  in  the  stimulus  is  not  used: 
the  acoustic  properties  that  might  cause  the  transitions  to  be  heard  as  chirps 
are  ignored — or  perhaps  we  should  say  that  the  auditory  consequences  of  those 
properties  are  suppressed — when  the  transitions  are  in  context  and  the 
linguistic  module  is  engaged.  The  exclusion  of  the  Irrelevant  extends,  of 
course,  to  stimulus  information  about  voice  quality,  which  helps  to  Identify 
the  speaker  (perhaps  by  virtue  of  some  other  module)  but  has  no  phonetic  im¬ 
portance,  and  even  to  that  extraphonetic  information  which  might  have  been 
supposed  to  help  the  listener  distinguish  sounds  that  contain  phonetic  struc¬ 
tures  from  those  that  do  not.  As  we  have  seen,  even  when  synthetic  speech 
lacks  the  acoustic  properties  that  would  make  it  sound  natural,  it  will  be 
treated  as  speech  if  it  contains  sufficiently  coherent  phonetic  Information. 
Moreover,  it  makes  no  difference  that  the  listener  knows,  or  can  determine  on 
auditory  grounds,  that  the  stimulus  was  not  humanly  produced;  because  linguis¬ 
tic  perception  is  informationally  encapsulated  and  mandatory,  the  listener 
will  hear  synthetic  speech  as  speech. 
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As  might  be  expected,  the  linguistic  module  is  also  very  good  at  exclud¬ 
ing  from  consideration  the  acoustic  effects  of  unrelated  objects  and  events  in 
the  environment;  the  resistance  of  speech  perception  to  noise  and  distortion 
is  well  known.  These  other  objects  and  events  are  still  perceived,  because 
they  are  dealt  with  by  other  modules,  but  they  do  not,  within  surprisingly 
wide  limits,  interfere  with  speech  perception  (cf.  Darwin,  198^)).  On  the  oth¬ 
er  hand,  the  module  is  not  necessarily  prepared  for  non-ecological  conditions, 
as  the  phenomenon  of  duplex  perception  Illustrates.  Under  the  conditions  of 
duplex  perception  the  module  makes  a  mistake  it  would  never  normally  make:  it 
treats  the  same  acoustic  information  both  as  speech  and  as  nonspeech.  And, 
being  an  informationally  encapsulated  and  mandatorily  operating  mechanism,  it 
keeps  on  making  the  same  mistake,  whatever  the  knowledge  or  preference  of  the 
listener . 

Our  claim  that  the  Invariants  of  speech  perception  are  phonetic  gestures 
is  much  easier  to  reconcile  with  a  modular  account  of  linguistic  perception 
than  with  a  cognitive  account.  On  the  latter  view,  the  gestures  would  have  to 
be  inferred  from  an  auditory  representation  of  the  signal  by  some  cognitive 
process,  and  this  does  not  seem  to  be  a  task  that  would  be  particularly  conge¬ 
nial  to  cognition.  Parsing  a  sentence  may  seem  to  bear  some  distant 
resemblance  to  the  proving  of  theorems,  but  disentangling  the  mutually 
confounding  auditory  effects  of  overlapping  articulations  surely  does  not.  It 
is  thus  quite  reasonable  for  proponents  of  a  cognitive  account  to  reject  the 
possibility  that  the  invariants  are  motoric  and  to  insist  that  they  are  to  be 
found  at  or  near  the  auditory  surface,  heuristic  matching  of  auditory  tokens 
to  auditory  prototypes  being  perfectly  plausible  as  a  cognitive  process. 

Such  difficulties  do  not  arise  for  our  claim  on  the  modular  account.  If 
the  invariants  of  speech  are  phonetic  gestures,  it  merely  makes  the  domain  of 
linguistic  perception  more  suitably  eccentric;  if  the  invariants  were  audi¬ 
tory,  the  case  for  a  separate  linguistic  module  would  be  the  less  compelling. 
Moreover,  computing  these  Invariants  from  the  acoustic  signal  is  a  task  for 
which  there  is  no  obvious  parallel  among  cognitive  processes.  What  is  re¬ 
quired  for  this  task  is  not  a  heuristic  process  that  draws  on  some  general 
cognitive  ability  or  on  knowledge  of  the  world,  but  a  special-purpose  computa¬ 
tional  device  that  relates  gestural  properties  to  the  acoustic  patterns. 

It  remains,  then,  to  say  how  the  set  of  possible  gestures  is  specified 
for  the  perceiver.  Does  it  depend  on  tacit  knowledge  of  a  kind  similar,  per¬ 
haps,  to  that  which  is  postulated  by  Chomsky  to  explain  the  universal  con¬ 
straints  on  syntactic  and  phonological  form?  We  think  not,  because  knowledge 
of  the  acoustic-phonetic  properties  of  the  vocal  tract,  unlike  other  forms  of 
tacit  knowledge,  seems  to  be  totally  inaccessible;  no  matter  how  hard  they 
try,  even  post-perceptually ,  listeners  cannot  recover  aspects  of  the  proc¬ 
ess — for  example,  the  acoustically  different  transitions — by  which  they  might 
have  arrived  at  the  distal  object.  But,  surely,  this  is  just  what  one  would 
expect  if  the  specification  of  possible  vocal-tract  gestures  is  not  tacit 
knowledge  at  all,  but  rather  a  direct  consequence  of  the  eccentric  properties 
of  the  module  itself.  As  already  indicated,  we  have  in  earlier  papers  sug¬ 
gested  that  speech  perception  is  accomplished  by  virtue  of  a  model  of  the  vo¬ 
cal  tract  that  embodies  the  relation  between  gestural  properties  and  acoustic 
information.  Now  we  would  add  that  this  model  must  be  part  of  the  very  struc¬ 
ture  of  the  language  module.  In  that  case,  there  would  be,  by  Fodor's  ac¬ 
count,  an  analogy  with  all  other  linguistic  universals. 
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Perception  and  Production:  One  Module  or  Two? 

For  want  of  a  better  word,  we  have  spoken  of  the  relation  between  speech 
perception  and  speech  production  as  a  "link,"  perhaps  implying  thereby  that 
these  two  processes,  though  tightly  bonded,  are  nevertheless  distinct.  Much 
the  same  implication  is  carried,  more  generally,  by  Fodor's  account  of 
modularity,  if  only  because  his  attention  is  almost  wholly  on  perception.  We 
take  pains,  therefore,  to  disown  the  implication  of  distinctness  that  our  own 
remarks  may  have  conveyed,  and  to  put  explicitly  in  its  place  the  claim  that, 
for  language,  perception  and  production  are  only  different  sides  of  the  same 
coin. 

To  make  our  intention  clear,  we  should  consider  how  language  differs  from 
those  other  modular  arrangements  in  which,  as  with  language,  perception  and 
action  both  figure  in  some  functional  unity:  simple  reflexes,  for  example;  or 
the  system  that  automatically  adjusts  the  posture  of  a  diving  gannet  in 
accordance  with  optical  information  that  specifies  the  time  of  contact  with 
the  surface  of  the  water  (Lee  &  Reddish,  198I  ).  The  point  about  such  systems 
is  that  the  stimuli  do  not  resemble  the  responses,  however  intimate  the 

connection  between  them.  Hence,  the  detection  of  the  stimulus  and  the  initia¬ 
tion  of  the  response  must  be  managed  by  separate  components  of  the  module. 
Indeed,  it  would  make  no  great  difference  if  these  cases  were  viewed  as  an  in¬ 
put  module  hardwired  to  an  output  module. 

Language  is  different:  the  neural  representation  of  the  utterance  that 
determines  the  speaker’s  production  is  the  distal  object  that  the  listener 

perceives;  accordingly,  speaking  and  listening  are  both  regulated  by  the  same 
structural  constraints  and  the  same  grammar.  If  we  were  to  assume  two  mod¬ 
ules,  one  for  speaking  and  one  for  listening,  we  should  then  have  to  explain 
how  the  same  structures  evolved  for  both,  and  how  the  representation  of  the 
grammar  acquired  by  the  listening  module  became  available  to  the  speaking  mod¬ 
ule. 

So,  if  it  is  reasonable  to  assume  that  there  is  such  a  thing  as  a  lan¬ 
guage  module,  then  it  is  even  more  reasonable  to  assume  that  there  is  only 

one.  And  if,  within  that  module,  there  are  subcomponents  that  correspond  to 

the  several  levels  of  linguistic  performance,  then  each  of  these  subcomponents 
must  deal  both  with  perception  and  production.  Thus,  if  sentence  planning  is 
the  function  of  a  particular  subcomponent,  then  sentence  parsing  is  a  function 
of  the  same  subcomponent,  and  similarly,  mutatis  mutandis,  for  speech  produc¬ 
tion  and  speech  perception.  And,  finally,  if  all  this  is  true,  then  the  cor¬ 
responding  input  and  output  functions  must  themselves  be  as  computationally 
similar  as  the  inherent  asymmetry  between  production  and  perception  permits, 
just  as  they  are  in  man-made  communication  devices. 

These  speculations  do  not,  of  course,  reveal  the  nature  of  the  computa¬ 
tions  that  the  language  module  carries  out,  but  they  do  suggest  a  powerful 
constraint  on  our  hypotheses  about  them,  a  constraint  for  which  there  is  no 
parallel  in  the  case  of  other  module  systems.  Thus,  they  caution  that,  among 
all  plausible  accounts  of  language  input,  we  should  take  seriously  only  those 
that  are  equally  plausible  as  accounts  of  language  output;  if  a  hypothesis 
about  parsing  cannot  be  readily  restated  as  a  hypothesis  about  sentence-plan¬ 
ning,  for  example,  we  should  suppose  that  something  is  wrong  with  it. 
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Whatever  the  weaknesses  of  the  motor  theory,  it  clearly  does  conform  to 
this  constraint,  since,  by  its  terms,  speech  production  and  speech  perception 
are  both  inherently  motoric.  On  the  one  side  of  the  module,  the  motor  ges¬ 
tures  are  not  the  means  to  sounds  designed  to  be  congenial  to  the  ear;  rather, 
they  are,  in  themselves,  the  essential  phonetic  units.  On  the  other  side,  the 
sounds  are  not  the  true  objects  of  perception,  made  available  for  linguistic 
purposes  in  some  common  auditory  register;  rather,  they  only  supply  the  infor¬ 
mation  for  immediate  perception  of  the  gestures. 
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LINGUISTIC  AND  ACOUSTIC  CORRELATES  OF  THE  PERCEPTUAL  STRUCTURE  FOUND  IN  AN 
INDIVIDUAL  DIFFERENCES  SCALING  STUDY  OF  VOWELS* 


Brad  Rakerdt  and  Robert  R.  Verbruggett 


Abstract.  Subjects  judged  the  similarities  among  a  set  of  American 
English  vowels  ( /i ,  i, e ,  ae,  A  ,a , d,o, u,u/)  presented  in  isolation  or  in 
a  /dVd/  consonantal  frame.  Individual  differences  scaling  was  em¬ 
ployed  to  analyze  these  similarities  data  for  each  of  the  conditions 
separately  and  for  the  two  conditions  combined.  In  all  cases, 
perceptual  dimensions  corresponding  to  the  advancement,  height,  and 
tenseness  vowel  features  were  recovered.  Given  the  determinacy  of 
individual  differences  scaling,  this  finding  is  taken  to  provide 
strong  evidence  for  the  perceptual  significance  of  those  features. 

The  perceptual  dimensions  are  considered  in  relation  to  various 
acoustic  parameters  of  the  stimuli  employed  in  this  study.  They  are 
also  considered  in  relation  to  perceptual  dimensions  that  have  been 
observed  in  other  vowel  scaling  studies. 

Introduction 

Multidimensional  scaling  provides  a  means  of  modeling  the  psychological 
structure  that  is  reflected  in  perceptual  judgments.  Scaling  is  particularly 
useful  because  judgments  regarding  a  large  number  of  stimuli  can  very  often  be 
modeled  with  a  structure  of  relatively  few  dimensions,  and  because  those  di¬ 
mensions  can  then  be  Interpreted  in  terms  of  properties  familiar  to  an 
investigator  (Carroll  &  Wish,  197*t;  Kruskal  &  Wish,  1973).  In  the  domain  of 
vowel  perception,  investigators  have  frequently  found  that  the  dimensions 
revealed  by  scaling  can  be  related  to  various  phonological  features,  a  fact 
which  is  taken  to  imply  that  those  features  play  a  significant  perceptual  role 
(e. g.,  Fox,  1983;  Singh  &  Woods,  1970;  Shepard,  1972). 

The  strength  of  that  implication  is,  however,  contingent  on  the  type  of 
scaling  method  that  is  used  in  a  study.  One  class  of  scaling  techniques 
yields  solutions  for  which  no  single  Interpretation  is  possible.  This  owes  to 
the  fact  that  the  models  of  psychological  structure,  which  are  spatial  in 
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character,  lack  a  fixed  orientation  for  their  axes.  One  must  therefore  rotate 
these  structures  in  search  of  an  orientation  that  permits  interpretation  of 
the  dimensions.  There  are  an  infinite  number  of  possible  rotations  and  the 
search  must  be  constrained  by  an  investigator’s  a  pri ori  notions  regarding 
interpretation.  Any  conclusions  drawn  are  correspondingly  vulnerable  to  the 
challenge  that  some  alternative  interpretation  would  have  been  equally 

supported  by  the  data  had  sane  other  rotation  been  carried  out. 

A  second  class  of  scaling  techniques  cannot  be  challenged  on  these 

grounds  because  they  specify  a  fixed  orientation  of  dimension  axes  for  their 

models  of  psychological  structure.  This  class,  the  individual  differences 
scaling  techniques,  achieve  their  added  determinacy  by  modeling  multiple  sets 
of  data  simultaneously,  each  set  reflecting  the  performance  of  a  different 
subject.*  An  important  underlying  assumption  of  individual  differences  scal¬ 
ing  is  that  when  judging  a  common  set  of  stimuli  subjects  can  differ  from  one 
another  in  terms  of  the  relative  weights  they  attach  to  a  set  of  shared 

perceptual  dimensions,  but  not  in  terms  of  the  identity  of  the  dimensions 
themselves  (Carroll  &  Chang,  1970),  Except  in  unusual  cases,  there  is  one  and 
only  one  orientation  in  which  the  shared  dimensions  can  be  weighted  so  as  to 
account  optimally  for  the  variance  in  those  subjects'  data.  That  is  the 
orientation  recovered  by  individual  difference  scaling.  It  has  been  conjec¬ 
tured  that  with  a  well-defined  perceptual  task  the  dimensions  revealed  by 
Individual  differences  scaling  will  correspond  to  fundamental  sensory  or  judg¬ 
mental  processes  (Carroll  &  Chang,  1970).  There  are  a  number  of  instances  in 
which  that  conjecture  has  been  supported  (Wish  &  Carroll,  1974). 

In  this  paper,  we  report  on  an  individual  differences  scaling  study  of 
vowel  perception.  It  was  conducted  to  address  questions  about  the  potential 
influences  that  consonantal  context  can  exert  on  vowel  perception,  and  else¬ 
where  (Rakerd,  1984)  we  have  considered  the  results  in  that  regard.  We  did  so 
by  canparing  the  weights  that  subjects  attached  to  a  set  of  shared  perceptual 
dimensions,  depending  on  whether  they  heard  vowels  in  or  out  of  a  consonantal 
frame.  Our  concern  here  is  not  with  the  weights,  however,  but  with  the  shared 
dimensions  themselves.  Those  dimensions  can  be  usefully  compared  with 
linguistic  features  that  have  been  found  to  be  related  to  perceptual  structure 
in  other  scaling  studies  (e.g..  Fox,  1982,  1983;  Terbeek,  1977),  particularly 
those  conducted  with  less  determinate  scaling  techniques  (Hanson,  1967;  Pols, 
van  der  Kamp,  i  Plomp,  1969;  Shepard,  1972;  Singh  &  Woods,  1970).  That  is  the 
first  purpose  of  this  paper.  We  examine  subjects  who  judged  vowels  in 
consonantal  context  and  subjects  who  judged  isolated  vowels,  analyzing  their 
data  both  separately  and  in  combination. 

The  second  purpose  of  this  paper  is  to  report  on  correlations  between  the 
perceptual  structure  revealed  by  individual  differences  scaling  and  various 
acoustic  parameters  of  our  vowel  stimuli.  Though  based  on  a  limited  number  of 
stimulus  tokens,  those  correlations  are  suggestive  in  that  they  speak  to 
hypotheses  that  previous  investigators  have  put  forth  regarding  relationships 
between  vowel  features  and  the  acoustic  signal. 

I.  Methods 


A.  Subjects 

Twenty-three  subjects  participated  in  this  experiment.  All  of  them  were 
native  speakers  of  English  with  normal  hearing  according  to  self-report. 
Twelve  of  the  subjects  were  randcroly  assigned  to  make  perceptual  Judgments  re- 
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garding  vowels  in  consonantal  context.  The  remaining  11  subjects  judged  vow¬ 
els  in  isolation. 

B.  Stimuli 


The  stimuli  for  the  experiment  were  ten  different  American  English  vowels 
{ /i ,  I,  e, ae,  A  ,a ,  3, o,  u,u/)  spoken  by  a  male  talker  with  a  general  American  dia¬ 
lect.  For  the  consonantal  context  condition,  he  produced  those  vowels  in  the 
trisyllabic  frame  /hodVda/,  with  stress  placed  on  the  second  syllable  (/dVd/). 
For  the  isolated  condition,  he  produced  them  with  no  surrounding  phonetic  con¬ 
text  (/#V#/).  Two  tokens  of  each  vowel  were  produced  in  each  condition. 
Recordings  of  those  tokens  were  digitized  at  a  sampling  rate  of  10  kHz  and 
stored  in  separate  canputer  files. 

C,  Procedure 


Subjects  were  tested  individually.  Their  task  was  to  judge  the  similari¬ 
ty  relations  that  they  perceived  among  the  ten  different  vowels.  They  were 
instructed  to  base  those  judgments  on  properties  of  the  vowel  sounds  that 
seemed  to  them  to  distinguish  words  in  English  (Carlson  &  Granstrflm,  1979; 
Klatt,  1979).  The  similarity  judgments  were  made  with  a  triadic  comparisons 
method  that  has  been  employed  in  previous  vowel  perception  studies  (Pols  et 
al.,  1969;  Terbeek,  1977;  Terbeek  &  Harshman,  1971).  According  to  this  proce¬ 
dure,  three  of  the  ten  vowels  were  rated  on  each  experimental  trial.  Subjects 
listened  to  these  vowels  in  any  order  that  they  chose  and  as  often  as  they 
chose.*  They  then  reported  which  two  of  the  three  vowels  sounded  most  alike 
to  them  and  which  two  least  alike.  Over  trials,  all  possible  triads  were 
judged.  The  judgments  were  then  summed  across  trials,  with  a  score  of  +  1  as¬ 
signed  to  all  most-alike  pairs  and  -1  to  all  least-alike  pairs.  This  yielded 
a  single  (symnetrlcal)  matrix  of  similarity  judgments  for  each  subject. 

D.  Data  analysis 

The  matrices  for  all  23  subjects  who  participated  In  the  experiment  were 
submitted  to  nonmetric  individual  differences  scaling,  using  the  ALSCAL  proce¬ 
dure  developed  by  Takane,  Young,  and  Leuuw  (1977).  It  was  determined  that  a 
three-dimensional  scaling  solution  was  most  appropriate  for  the  data.  That 
decision  was  based  on  several  factors.  First,  modeling  in  three  dimensions 
accounted  for  a  substantially  greater  percentage  of  variance  (an  average  of 
70%  for  each  subject)  than  modeling  in  two  dimensions  (6011),  and  only  margi¬ 
nally  less  than  modeling  in  four  dimensions  (72%).  Second,  the  three  dimen¬ 
sions  were  readily  interpretable  from  a  linguistic  standpoint.  And  finally, 
those  dimensions  were  quite  stable,  in  that  they  were  also  found  In  separate 
analyses  of  the  two  experimental  conditions  (see  Sec.  II)  and,  with  certain 
modeling  constraints,  in  the  scaling  solution  for  a  memory  study  (Rakerd, 
198M)  that  complemented  this  perceptual  study. 

For  additional  details  concerning  the  data  analysis,  as  well  as  other  as¬ 
pects  of  the  experimental  method,  see  Rakerd  (198*1). 

II.  The  Scaling  Solutions 

We  first  consider  the  perceptual  dimensions  that  emerged  from  an  analysis 
of  data  matrices  for  all  23  subjects.  Although  these  dimensions  have  been  de¬ 
scribed  elsewhere  by  Rakerd  (198*1),  they  are  examined  here  in  greater  detail. 
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with  particular  attention  paid  to  ccmparisons  with  phonological  features  of 
linguistic  description  for  vowels,  and  with  dimensions  that  have  been  reported 
in  previous  scaling  studies  of  vowels.  In  the  second  part  of  Sec.  II,  we  de¬ 
scribe  the  perceptual  dimensions  that  resulted  from  separate  analyses  of  the 
consonantal- context  and  isolated  conditions  of  the  study. 

A.  The  two  conditions  comb ined 

1 .  Dimensions  1_  and  2 

Dimension  2  (D2)  of  the  scaling  solution  for  all  subjects  is  plotted 
against  dimension  1  (Dl)  in  the  top  half  of  Fig.  1.  The  distribution  of  vow¬ 
els  in  this  plane  is  clearly  related  to  the  traditional  "vowel  quadrilateral" 
(Ladefoged,  1975;  Lindau,  1978),  with  Dl  corresponding  to  the  advancement 
feature  of  vowels,’  and  D2  to  the  he ight  feature.  There  is  considerable  prec¬ 
edent  for  observing  correlates  of  these  two  phonological  features  in  vowel 
scaling  studies  (Fox,  1982,  1983;  Hanson,  1967;  Pols  et  al.,  1969;  Shepard, 
1972;  Singh  &  Woods,  1970).  Those  findings,  together  with  the  results  of  the 
present  study,  strongly  support  the  view  that  the  advancement  and  height  fea¬ 
tures  play  a  significant  role  in  the  perception  of  vowels  in  English.  The 
findings  are  also  consistent  with  the  larger  view  that  advancement  and  height 
enjoy  a  special  status  in  all  languages  (Lindau,  1978). 

2.  Dimension  ^ 

The  third  dimension  of  the  combined  group  space  (D3)  is  plotted  against 
Dl  in  the  bottom  half  of  Fig.  1.  The  vowels  are  ordered  along  it  such  that 
/i,aB,e,A,u/  have  negative  values  and  /i,a,o,o, u/  have  positive  values.  The 
former  are  lax  vowels,  the  latter  tense.  Hence,  D3  can  be  interpreted  as  cor¬ 
responding  to  the  tenseness  feature.  Unlike  advancement  and  height,  a  tense¬ 
ness  dimension  has  very  rarely  been  recovered  in  vowel  scaling  studies.  To 
our  knowledge,  only  Anglin  (1971;  cited  in  Singh,  1976),  who  scaled  similarity 
judgments  for  vowels  in  /hVd/  context,  has  recovered  a  dimension  similar  to 
D3.  In  that  analysis,  the  scaling  method  did  not  yield  a  single,  interpret¬ 
able  orientation  for  the  model  of  psychological  structure.  The  present,  more 
determinate  scaling  result  might  therefore  be  taken  to  provide  the  strongest 
available  evidence  for  perceptual  significance  of  the  tenseness  feature. 

B.  Separate  analyses  of  the  conditions 

When  perceptual  judgments  for  the  isolated  and  consonantal-context  condi¬ 
tions  were  scaled  separately,  in  three  dimensions,  the  amount  of  variance  that 
could  be  accounted  for  in  the  data  (VAF)  improved  marginally  over  its  corre¬ 
sponding  value  in  the  combined  analysis.  (VAF  for  analysis  of  the  isolated 
condition  was  74$,  that  for  the  consonantal- context  condition  was  72$.  This 
compares  with  70$  in  the  combined  analysis.)  This  marginal  improvement  result¬ 
ed  from  some  local  shifts  in  the  positioning  of  vowels  in  the  separate  scaling 
solutions.  As  will  be  seen,  the  global  structure  nevertheless  remained  quite 
similar  to  that  of  the  combined  analysis. 

1 .  The  Isolated  condition 

The  perceptual  dimensions  for  the  Isolated  condition  are  shown  In  Fig.  2. 
Only  D2  is  notably  different  from  the  corresponding  dimensions  of  the  combined 
analysis  (see  Fig.  1),  Along  this  dimension,  the  vowels  /e/  and  /o/  have  as- 
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sumed  values  that  are  somewhat  more  positive  than  they  had  been  previously. 
The  movement  of  /e/  principally  reflects  the  fact  that  /e/  and  /i/  were  judged 
to  be  highly  similar,  indeed,  the  most  similar  of  all  vowel  pairs  in  the 
isolated  condition.  Likewise,  the  movement  of  /o/  is  largely  dictated  by  a 
single  vowel  pairing;  /o/  and  /u/  were  judged  to  be  extremely  similar  in  iso¬ 
lation,  perhaps  reflecting  the  fact  that  they  were  the  only  two  diphthongized 
vowels  in  the  isolated  set. 

Despite  repositioning  of  these  two  vowels,  the  dimensions  of  the  isolated 
solution  maintain  a  strong  correspondence  with  the  advancement,  height,  and 
tenseness  features,  respectively. 

This  analysis  can  be  usefully  compared  with  one  by  Singh  and  Woods  (1970) 
in  which  it  was  found  that  tenseness  had  no  perceptual  significance  for 
listeners  who  rated  the  relative  similarity  of  isolated  vowels.  Those 
investigators  attributed  their  finding  to  listeners'  knowledge  that  isolated 
lax  vowels  are  phonologically  impermissible  in  English.  The  outcome  of  the 
present  study  indicates  that  there  may  have  been  other  factors  at  work  as 
well.  For  several  of  our  isolated-vowels  subjects,  the  tenseness  dimension 
(D3)  did,  indeed,  have  little  or  no  perceptual  salience,  but  for  others  it  was 
the  most  heavily  weighted  dimension  (Rakerd,  1984).  Perhaps  talkers  produced 
their  isolated  vowels  differently  in  the  Singh  and  Woods  study,  or  perhaps,  by 
averaging  their  data  over  subjects  prior  to  scaling,  Singh  and  Woods  lost  any 
statistical  evidence  of  the  significance  of  tenseness.  Whatever  the  case,  it 
is  apparent  that  under  certain  conditions  listeners  can  attend  to  the  tense¬ 
ness  dimension  of  Isolated  vowels,  despite  the  phonological  restriction. 

2.  The  consonanta 1-context  condition 

Perceptual  dimensions  for  the  separate  analysis  of  the  consonantal-con¬ 
text  condition  are  shown  in  Fig.  3.  D1  and  D2  are  quite  similar  to  their 
counterparts  in  the  combined  analysis  (Fig.  1),  again  reflecting  sensitivity 
to  advancement  and  vowel  height,  respectively.  Along  the  third  dimension, 
there  is  some  divergence  from  the  combined  solution,  with  the  vowel  /i/  moving 
in  a  more  positive  direction.  This  movement  resulted  from  the  fact  that  the 
/i-i/  vowel  pair  was  judged  highly  similar  in  consonantal  context.  Neverthe¬ 
less,  D3  retains  a  correspondence  with  tenseness. 

3.  Stability  of  the  scaling  solutions 

The  agreement  among  these  separate  scaling  solutions  and  the  combined 
solution  is  evidence  of  the  stability  of  this  modeling  outcome.  Perceptual 
dimensions  closely  related  to  advancement,  height,  and  tenseness  were  recov¬ 
ered  in  all  cases,  which  makes  it  extremely  unlikely  that  their  emergence  in 
any  individual  case  was  a  coincidental  consequence  of  the  scaling  analysis  it¬ 
self. 


III.  Acoustic  Correlates  of  the  Perceptual  Dimensions 

We  computed  correlations  to  assess  the  strength  of  relationships  between 
the  perceptual  dimensions  revealed  by  our  combined  scaling  analysis  and  vari¬ 
ous  acoustic  parameters  of  the  vowel  stimuli.  The  acoustic  measurements  were 
made  from  wideband  spectrograms.  In  the  case  of  isolated  vowels,  center  fre¬ 
quencies  of  the  first  three  formants  (Fl,  F2,  and  F3)  were  measured  at  a  point 
approximately  halfway  through  each  token.  Duration  of  voicing  was  also  mea¬ 
sured  for  the  isolated  vowels. 
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The  acoustic  structure  of  the  /dVd/  syllables  comprised  an  onglide  (or 
period  of  syllable-initial  formant  transition)  and  an  offglide  (syllable-final 
transition),  with  little  or  no  region  in  which  the  formants  could  be  described 
as  miaintaining  a  steady-state  frequency.  Therefore,  we  adopted  the  convention 
of  measuring  FI,  F2.  and  F3  at  the  end  of  the  onglide  (a  point  also  represent¬ 
ing  the  beginning  of  the  offglide).  Duration  was  measured  from  the  first  evi¬ 
dence  of  voicing  following  initial-/d/  release  to  the  last  prior  to  final-/d/ 
closure.  Last,  we  computed  the  proportion  of  total  syllable  duration  that  was 
taken  up  by  the  offglide. 

Recall  that  there  were  two  tokens  of  each  vowel  in  each  context.  The  me¬ 
an  parameter  values  for  those  two  tokens  are  listed  in  Table  1.  Isolat¬ 
ed-vowel  parameters  appear  in  the  top  half  of  the  table,  /dVd/  parameters  in 
the  bottom  half.  An  examination  of  Table  1  shows  that  the  stimuli  were 
acoustically  "normal"  in  the  sense  that  their  parameters  were  roughly  compar¬ 
able  to  those  that  other  investigators  have  reported  for  much  larger  data 
bases  (Klatt,  1975:  Peterson  &  Barney,  1952;  Peterson  &  Lehiste,  I960;  Umeda, 
1975).  The  data  also  provide  evidence  of  vowel  reduction  (Joos,  19^8:  Lind- 
blom,  1963)  in  consonantal  context.  Formant  frequency  differences  among  the 
vowels  were  smaller  in  the  /dVd/  condition  than  in  isolation. 

Rank-order  correlations  (Spearman's  rho)  were  computed  between  the  acous¬ 
tic  data  reported  in  Table  1  and  coordinates  for  the  perceptual  dimensions  of 
the  combined  analysis.  The  results  are  reported  in  Table  2.  First  consider 
correlations  for  the  isolated  vowels,  which  appear  in  the  top  half  of  the 
table.  The  following  correlations  (and  no  others)  proved  significant:  Dl 
(which  we  have  interpreted  as  advancement)  with  F2  and  F3,  D2  (height)  with 
Fl,  and  D3  (tenseness)  with  duration.  The  findings  regarding  Dl  and  D2  are 
anticipated  by  a  number  of  previous  scaling  studies  (Fox,  1982,  1983:  Pols  et 
al.,  1969;  Shepard,  1972),  The  finding  for  D3  is  consistent  with  the  report 
that  vowel  tenseness  is  related  to  duration  (Peterson  &  Lehiste,  I960). 

The  bottom  half  of  Table  2  shows  correlations  for  vowel  in  /dVd/  context. 
Note  that  relative  to  the  isolated  vowels  there  is  a  substantial  reduction  in 
the  strength  of  the  correlation  between  Dl  and  F2  (0.72,  down  from  0.95)  and 
between  Dl  and  F3  (0.66,  down  from  0.8H).  These  statistical  changes  reflect 
the  fact  that  the  high-back  vowels  /o,u,u/  were  radically  reduced  in  /dVd/ 
context,  as  might  be  expected  given  the  alveolar  place  of  articulation  of  the 
consonants.  Though  not  unusual,  this  circumstance  merits  comment  in  that  it 
calls  into  question  strong  statements  to  the  effect  that  the  relationship  be¬ 
tween  the  advancement  feature  and  the  formant  structure  of  vowels  is  a  simple 
one  (see,  e.g.,  Lindau,  1978;  Singh,  1976).  Our  finding  is  one  of  the  sort 

that  shows  that  this  relationship  is  affected  by  the  phonetic  context  in  which 

a  vowel  occurs. 

It  can  also  be  seen  in  Table  2  that,  in  the  consonantal-context  condi¬ 
tion,  duration  was  not  significantly  correlated  with  D3  (tenseness),  as  it  had 
been  with  isolated  vowels.  It  appears  that  judgments  regarding  D3  could  not 
have  been  made  on  the  basis  of  vowel  duration  in  this  condition.  Apparently, 
subjects'  perceptions  of  tenseness  were  cued  by  some  other  acoustic  property 
in  the  /dVd/  context.  A  likely  candidate  is  offglide  proportion,  which  was 
significantly  correlated  with  D3.  Indeed,  it  is  possible  to  account  perfectly 
for  at  least  the  macrostructure  of  D3  ordering  on  the  basis  of  offglide 
proportion  alone.  Table  1  shows  that  the  tense  vowels,  which  all  had  positive 

D3  coordinates,  also  had  offglide  proportions  of  50t  or  less,  and  that  the  lax 
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Table  1 

Acoustic  Parameters  of  the  Stimuli 


Formant 

frequencies 

Offglide 

Condition 

Vowel 

FI 

F2 

F3 

Duration 

proportion 

Isolated 

i 

225 

2210 

2835 

235 

vowels 

I 

465 

1920 

2600 

165 

•  •  • 

c 

555 

1620 

2135 

180 

»  •  • 

ae 

665 

1200 

2225 

180 

•  •  • 

A 

640 

1640 

2170 

155 

•  •  • 

a 

780 

1180 

2090 

235 

•  •  • 

a 

680 

1000 

21 75 

195 

•  •  • 

0 

515 

950 

2110 

225 

•  •  • 

u 

565 

1125 

2020 

135 

•  •  • 

u 

395 

875 

2085 

220 

. . . 

Consonantal 

i 

330 

2060 

2595 

120 

0.50 

context 

I 

455 

1795 

2435 

90 

0.61 

c 

545 

1640 

2515 

125 

0.61 

ae 
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o 
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u 

460 
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95 

0.60 

u 

420 

1355 
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Table  2 

Rank-order  Correlations  Between  Acoustic  Parameters  of  the  Stimuli  and 
Perceptual  Dimensions  of  the  Combined  Analysis 


Acoustic 

Perceptual  dimensions 

Condition 

parameter 

D1 

D2 

D3 

Isolated 

FI 

0.12 

-0.92® 

-0.37 

vowels 

F2 

-O.95J 

0.30 

-0.31 

F3 

-0.84® 

0.19 

0.09, 

Duration 

-0.13 

-0.17 

0.76 

Consonantal 

FI 

0.04 

-0.95® 

-0.31 

context 

F2 

-0.72® 

0.67® 

-0.32 

F3 

-0.66® 

0.47 

-0.18 

Duration 

0.23 

-0.87 

0.27 

Offglide  prop. 

-0.61® 

0.15 

-0.73' 

S  <  0.05 
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vowels,  which  all  had  negative  D3  coordinates,  also  had  offglide  proportions 
of  60?  or  more.  This  finding  is  reminiscent  of  an  observation  made  by  Lehiste 
and  Peterson  (1961),  although  our  measurement  procedures  were  somewhat  differ¬ 
ent  from  theirs.  In  both  instances,  tense  vowels  were  found  to  be  marked  by  a 
relatively  brief  period  of  offglide  into  a  following  consonant,  and  lax  vowels 
by  an  offglide  that  was  more  substantial  in  duration. 

A  number  of  investigators  have  reported  that  vowels  in  consonantal  con¬ 
text  are  identified  with  greater  accuracy  than  isolated  vowels  (Gottfried  & 
Strange,  1980;  Rakerd  et  al. ,  1984;  Strange,  Edman,  &  Jenkins,  1976;  Strange, 
Verbrugge,  Shankweiler,  &  Edman,  1979).  It  has  been  suggested  that  one  reason 
for  this  perceptual  advantage  may  be  that  the  dynamic  acoustic  structure  of 
syllables  is  a  unique  source  of  vowel  information  (Strange  et  al.,  1976; 
Strange,  Jenkins,  &  Johnson,  1983).  Our  observation  of  an  association  between 
offglide  proportion  and  the  tenseness  feature  is  certainly  consistent  with 
this  view. 

IV.  Summary  and  Conclusions 

A  stable.  Interpretable  individual  differences  scaling  solution  was  found 
for  subjects'  similarity  judgments  regarding  a  set  of  American  English  vowels. 
This  solution  had  three  dimensions,  which  corresponded,  respectively,  to  the 
linguistic  features  of  advancement,  height,  and  tenseness.  Those  correspond¬ 
ences  provide  particularly  strong  evidence  for  the  perceptual  significance  of 
the  features  due  to  the  determlnacy  of  individual  differences  sealing. 

While  the  results  regarding  the  advancement  and  height  features  confirm 
expectations  based  on  a  number  of  previous  scaling  studies,  recovery  of  a 
tenseness  dimension  is  more  surprising.  One  reason  for  its  recovery  in  the 
present  instance  may  have  to  do  with  the  individual  differences  scaling  method 
itself.  Across  subjects,  there  was  wide  variability  in  the  perceptual  sali¬ 
ence  of  tenseness,  particularly  among  those  who  rated  Isolated  vowels  (Rakerd, 
1984);  With  Individual  differences  scaling,  this  variability  was  manifest  in 
the  different  weighting  that  each  subject  attached  to  D3.  However,  had  the 
data  been  averaged  over  subjects  prior  to  analysis,  as  required  by  many  scal¬ 
ing  methods,  it  is  likely  that  the  variability  would  have  made  it  impossible 
to  recover  a  tenseness  dimension.  It  may  also  be  relevant  that  we  Instructed 
subjects  to  attend  to  those  aspects  of  the  vowel  sounds  that  seemed  to  them  to 
distinguish  words  in  English.  Previous  investigators  (Carlson  &  GranstrBm, 
1979;  Klatt,  1979)  have  reported  that  an  instruction  of  this  type  can 
strengthen  the  linguistic  character  of  subjects’  perceptual  judgments. 

There  were  two  noteworthy  findings  regarding  correlations  between  the 
scaling  results  and  acoustic  parameters  of  the  vowel  stimuli.  The  first  was 
that  vowel  duration  was  not  significantly  correlated  with  the  tenseness  dimen¬ 
sion  in  /dVd/  context.  Hence,  the  emergence  of  this  dimension,  particularly 
in  the  separate  analysis  of  the  consonantal- con text  condition,  cannot  be 
attributed  to  subjects  having  attended  to  durational  differences  among  the 
vowels. 

The  second  observation  was  that  in  /dVd/  context  tenseness  was  signif¬ 
icantly  correlated  with  offglide  proportion.  Tense  vowels  had  an  internal 
syllable  structure  in  which  the  offglide  constituted  50?  or  less  of  the  vocal¬ 
ic  region.  For  lax  vowels,  the  offglide  made  up  60?  or  more  of  the  vocalic 
region.  This  finding  is  similar  to  one  reported  by  Lehiste  and  Peterson 
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(1961).  The  two  findings  together  support  the  view  that  the  dynamic  acoustic 

structure  of  syllables  can  be  a  unique  source  of  vowel  Information  for  a 

perceiver  (Strange  et  al.,  1976,  1983). 
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Footnotes 

‘Although  this  Is  most  commonly  the  case,  and  was  the  case  In  the  present 
study,  each  of  the  several  data  matrices  submitted  to  an  Individual  differ¬ 
ences  scaling  analysis  need  not  represent  the  performance  of  a  single  subject. 
As  alternatives,  there  could,  for  example,  be  one  matrix  for  each  of  the  sev¬ 
eral  conditions  of  an  experiment,  or  one  for  each  of  the  several  experiments 
In  a  study.  From  a  computational  standpoint.  It  Is  only  required  that  there 
be  multiple  matrices. 

*The  subjects  had  complete  control  over  the  ordering  and  pacing  of  stimu¬ 
lus  presentation.  They  directed  presentation  of  the  triad  of  stimuli  for  each 
trial  by  pressing  three  different  buttons  on  a  computer  terminal. 

*The  term  advancement  is  used  to  be  consistent  with  the  earlier  work  of 
Singh  and  Woods  (1970),  and  with  Rakerd  (1984).  An  alternative,  and  perhaps 
more  common  term  for  this  feature  would  be  backness  (Ladefoged,  1975). 
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Abstract.  A  series  of  experiments  was  conducted  to  examine  the 
perceptual  stability  of  stop  consonants  cued  by  silence  alone,  as 
when  [s]+silence+[laet]  is  perceived  as  "splat."  Following  a  repli¬ 
cation  of  this  perceptual  integration  phenomenon  (Exp.  1),  attempts 
were  made  to  block  it  by  instructing  subjects  to  disregard  the  ini¬ 
tial  [s]  and  to  focus  instead  on  the  onset  of  the  following  signal, 
which  was  varied  from  [plaet]  to  [laet].  However,  these  instruc¬ 
tions  had  little  effect  at  short  silence  durations  (Exp.  2),  and 
they  reduced  stop  percepts  for  only  two  subjects  at  longer  silence 
durations  (Exp.  3).  That  is,  subjects  were  generally  unable  to 
dissociate  the  [s]  noise  from  the  following  signal  voluntarily  and 
thus  to  perceive  the  silent  interval  as  silence  rather  than  as  a 
carrier  of  phonetic  information.  A  low-uncertainty  paradigm 

facilitated  the  task  somewhat  (Exp.  4).  However,  when  the  [s]  frl- 
cation  was  replaced  with  broadband  noise  (Exp.  5),  listeners  had  no 
trouble  at  all  in  the  selective-attention  task,  except  at  very  short 
silence  durations  (<  HO  ms).  This  last  finding  suggests  that,  ex¬ 
cept  for  the  shortest  durations,  the  effect  of  silence  on  phonetic 
perception  does  not  arise  at  the  level  of  psychoacoustic  stimulus 
interactions.  Rather,  the  results  support  the  hypothesis  that 
perceptual  integration  of  speech  components,  including  silence,  is  a 
largely  obligatory  perceptual  function  driven  by  the  listener’s  tac¬ 
it  knowledge  of  phonetic  regularities. 

When  listening  to  speech  we  perceive  a  coherent  stream  of  sound,  not  a 
sequence  of  clicks,  whistles,  buzzes,  and  hisses.  In  view  of  the  many  abrupt 
changes  of  excitation  and  spectral  structure  that  take  place  in  normal  speech, 
this  apparent  auditory  coherence  might  seem  like  a  remarkable  perceptual 
accomplishment.  However,  it  may  well  reflect  the  fact  that  the  ordinary 
listener's  attention  is  not  focused  on  the  detailed  physical  properties  of  the 
speech  signal  but  on  the  underlying,  linguistically*  relevant  information. 
That  is,  auditory  coherence  of  speech  may  be  Inferred  from  the  perceived  actu¬ 
al  continuity  of  certain  underlying  articulatory  events.  If  so,  then  there 
may  be  a  more  analytic  level  of  perception  that  is  sensitive  to  physical 
discontinuities  in  the  speech  signal. 

Speech  does  possess  certain  acoustic  features  that  promote  auditory 
coherence  of  otherwise  disparate  signal  portions.  For  example,  formant 
transitions  have  been  considered  to  provide  a  kind  of  "perceptual  glue"  that 
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holds  successive  sounds  together  and  helps  preserve  their  temporal  order  (Cole 
&  Scott,  1973;  Dorman,  Cutting,  &  Raphael,  1975).  This  can  hardly  be  the 
whole  story,  however.  If  perceptual  coherence  and  integration  were  determined 
entirely  by  properties  of  the  acoustic  signal  and  their  auditory  transforms, 
it  would  be  impossible  for  a  listener  to  decompose  the  speech  signal  into  its 
components  deliberately.  Nevertheless,  this  is  possible,  at  least  to  a  cer¬ 
tain  extent,  by  focusing  one's  attention  on  the  level  of  auditory  qualities 
(see  e.g. ,  Pilch,  1979).  For  example,  it  is  not  difficult  even  for  a  naive 
listener  to  attend  selectively  to  the  series  of  high-pitched  hisses  that  rep¬ 
resent  repeated  occurrences  of  [s]  in  the  speech  stream.  Under  special  condi¬ 
tions,  the  perceptual  isolation  of  such  auditory  components  may  be  facilitat¬ 
ed:  Cole  and  Scott  (1973)  rapidly  repeated  the  syllable  [sa]  over  and  over, 
and  listeners  soon  reported  hearing  two  separate  streams  of  sounds,  one 
consisting  of  hisses  (the  fricative  noises)  and  the  other  of  syllables  sound¬ 
ing  like  [ta]  (the  vowel  with  its  initial  formant  transitions).  In  this 
unnatural  situation,  the  segregation  may  take  place  at  a  relatively  early 
perceptual  stage;  similar  "streaming"  can  be  induced  in  repetitive  multicompo¬ 
nent  nonspeech  signals  (Bregman,  1978). 

Under  more  natural  circumstances,  the  perceptual  integration  of  certain 
disparate  acoustic  components  of  speech  may  still  not  be  completely  obligato¬ 
ry,  though  it  reflects  the  normal  mode  of  speech  perception.  If  perceptual 
integration  of  these  speech  components  could  be  disengaged  by  manipulating 
listeners'  interpretation  of  the  stimulus,  this  would  suggest  that  the  normal¬ 
ly  perceived  coherence  of  the  speech  signal  is  contingent  on  a  nonobligatory , 
central  function  characteristic  of  phonetic  perception.  If  the  integrative 
function  proved  difficult  to  disengage,  and  if  low-level  psychoacoustic 

interactions  can  be  ruled  out  as  the  cause  of  the  integration,  then  the 

conclusion  would  be  that  perceptual  integration  of  spe . 7h  components  is  not 
only  a  characteristic  but  also  an  obligatory  function  of  pionetic  perception.* 

Evidence  in  favor  of  the  hypothesis  that  certain  types  of  perceptual 

integration  are  speech- spec  if ic  has  been  obtained  in  several  recent  studies 
concerned  with  "trading  relations"  among  acoustic  cues.  Thus,  Best,  Morron- 
giello,  and  Robson  (1981  )  have  shown  that,  in  noise-plus-sinewave  analogs  of 
utterances  of  the  type  "say"  versus  "stay,"  the  silent  closure  interval 

following  the  noise  and  the  onset  frequency  of  the  tone  mimicking  the  first 
formant  (FI)  both  contribute  to  a  stop  consonant  percept  as  long  as  the  stimu¬ 
li  are  perceived  as  speech;  however,  when  the  stimuli  are  perceived  as  non¬ 
speech,  the  two  acoustic  cues  are  no  longer  integrated  and  are  perceived  as 
unrelated  auditory  properties.  In  another  study.  Repp  (1981)  trained  subjects 
to  discriminate  the  pitch  of  fricative  noises  preceding  different  vowels  con¬ 
taining  one  of  two  sets  of  formant  transitions.  There  was  no  effect  of  the 
vocalic  context  on  the  subjects'  pitch  judgments,  even  though  the  phonetic 
identification  of  the  fricative  consonant  was  influenced  by  both  vowel  quality 
and  formant  transitions.  Furthermore,  Dorman,  Raphael,  and  Liberman  (1979) 
and  Rakerd,  Dechovitz,  and  Verbrugge  (1982)  experimented  with  utterances  whose 
precise  phonetic  Interpretation  depended  on  the  duration  of  a  silent  closure 
Interval  occurring  at  a  syllable  boundary.  When  either  fundamental  frequency 
(Dorman  et  al.,  1979)  or  the  intonation  contour  (Rakerd  et  al.,  1982)  was 
changed  abruptly  across  syllables,  the  silence  lost  its  perceptual  effect. 
Although  spectral  discontinuity  could  have  played  a  role  here,  circumstantial 
evidence  suggests  that  subjects'  perception  of  one  versus  two  speakers  or 
utterances  was  responsible  for  the  effect.  Thus,  all  the  studies  cited  pro¬ 
vide  evidence  for  a  central  level  of  perceptual  integration  that  can  be  disen- 
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gaged  in  at  least  three  ways:  by  leaving  the  speech  mode  altogether,  by 
selectively  attending  to  specific  auditory  properties  of  the  speech  signal,  or 
by  perceiving  a  change  of  source  or  of  linguistic  structure. 

In  the  present  research,  the  focus  is  on  the  perceptual  integration 
occurring  in  [spl]  clusters.  Acoustic  cues  to  the  perception  of  a  labial  stop 
consonant  in  this  context  include,  first  and  foremost,  an  interval  of  silence 
following  the  [s]  noise  (Bastian,  Eimas,  &  Liberman,  1961;  Fitch,  Halwes, 
Erickson  &  Liberman,  1980),  but  also  spectral  changes  in  the  fricative  noise 
and  the  amplitude  contour  at  noise  offset  (Summerfield,  Bailey,  Seton,  &  Dor¬ 
man,  1981),  the  duration  of  the  [s]  noise  (Repp,  1984c),  the  presence  and  am¬ 
plitude  of  a  release  burst  following  the  silent  closure  (Repp,  1984b,  1984d), 
formant  onset  frequencies  and  transitions  in  the  following  voiced  portion 
(Fitch  et  al.,  1980;  see  also  Bailey  &  Summerfield,  1980),  and  the  duration 
and  possibly  the  amplitude  envelope  of  the  voiced  portion  (Repp,  1984c).  Of 
special  interest  here  is  the  finding  (Dorman  et  al. ,  1979)  that  a  percept  of 
"split"  can  be  elicited  by  simply  concatenating  an  [s]  noise  and  a  [lit]  syll¬ 
able,  with  an  appropriate  interval  of  silence  (about  100-300  ms)  in  between; 
in  other  words,  in  this  context  silence  alone  can  be  a  sufficient  cue  for  the 
perception  of  a  "p,"  as  long  as  there  are  no  contradictory  cues  from  the 
surrounding  signal  portions.  Since  neither  of  the  energy-carrying  signal  por¬ 
tions  in  isolation  contains  sufficient  cues  to  a  "p,"  and  the  silence  by  it¬ 
self  naturally  does  not  either,  the  stop  consonant  percept  in  this  case  is  a 
pure  product  of  perceptual  integration  over  time  and  thus  constitutes  an  ideal 
test  case  for  our  purposes. 

The  question  addressed  in  the  present  study  is:  How  robust  is  this 
perceptual  integration  effect — that  is,  can  a  listener  deliberately  avoid  the 
stop  consonant  percept  and  hear  the  stimulus  components  the  way  they  sound  in 
isolation,  for  example,  as  "s"  followed  by  "lit"?  This  question  is  not  unrea¬ 
sonable  because  a  stop  cued  by  silence  alone  does  not  sound  perfectly  natural 
and  might  be  expected  to  be  perceptually  unstable,  almost  an  illusion.  The 
answer  to  the  question  also  bears  on  two  contrasting  hypotheses  that  have  been 
put  forward  to  account  for  perceptual  integration  and  cue  trading  relations  in 
phonetic  perception  (see  Pastore,  1981;  Repp,  1982):  If  these  phenomena  are  a 
function  of  purely  psychoacoustic  stimulus  properties  that  emerge  in  peripher¬ 
al  auditory  processing,  then  it  should  be  extremely  difficult  to  disengage 
them  through  acts  of  selective  attention  or  linguistic  restructuring.  If  they 
are  a  function  of  speech-specific  mechanisms,  however,  it  might  be  possible  to 
change  them  by  manipulating  listeners*  interpretation  of  the  stimulus,  without 
necessarily  leaving  the  speech  mode.  A  positive  result  would  simultaneously 
refute  the  psycnoacoustlc  hypothesis  and  support  the  existence  of  a  special 
integrative  level  of  perception,  whereas  a  negative  result,  to  be  interpret¬ 
able,  would  require  an  additional  demonstration  that  psychoacoustic  interac¬ 
tions  are  not  the  cause  of  the  subjects'  difficulty. 

Accordingly,  this  paper  reports  several  attempts  to  "get  rid  of  the  stop" 
in  subjects'  perception  of  [ s]+silence+[llt]  -  "split"  type  utterances  by 
directing  their  attention  to  the  stimulus  portion  following  the  silence.  A 
replication  of  the  basic  phenomenon  of  silence-cued  stop  consonant  perception 
(Exp.  1)  is  followed  by  experiments  that  investigate  the  effect  of  selective 
attention  instructions  for  stimuli  with  different  absolute  silence  durations 
(Exps.  2  and  3),  and  with  some  subsequent  changes  in  test  format  to  reduce 
stimulus  uncertainty  (Exp.  4).  Since,  as  will  be  seen,  the  stop  consonant 
percepts  proved  unexpectedly  resistant  to  these  manipulations,  the  last 
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experiment  (Exp.  5)  aimed  at  ruling  out  psychoacoustic  interactions  as  the 
cause  of  the  silence-cued  stop  percept.  On  the  assumption  that  this  last 
study  succeeded  in  its  aim,  the  conclusion  will  be  that  perceptual  integration 
of  speech  components,  in  this  instance  at  least,  is  a  relatively  compulsory 
function  of  phonetic  perception. 


Experiment  1 

Experiment  1  was  an  attempt  to  replicate  an  earlier  striking  demonstra¬ 
tion  of  the  perceptual  integration  phenomenon  of  interest,  owing  to  Etorman  et 
al.  (1979,  Exp.  3).  These  authors  concatenated  natural  [s]  and  [lit]  utter¬ 
ances  that  had  been  recorded  in  isolation  and  that  were  considered  to  contain 
no  traces  of  any  [p].  When  the  silent  interval  between  the  stimulus  compo¬ 
nents  was  shorter  than  60  ms,  listeners  uniformly  reported  "slit."  At  silent 
Intervals  between  80  and  450  ms,  however,  listeners  reported  predominantly 
"split,"  with  a  maximum  of  over  90  percent  around  300  ms  of  silence.  This  op¬ 
timal  closure  interval  was  much  longer  than  a  typical  [p]  closure  in  this  con¬ 
text  (about  90  ms;  see  Morse,  Ellers,  &  Gavin,  1982);  moreover,  it  took  as 
much  as  650  ms  of  silence  before  subjects  uniformly  reported  hearing  "s-lit" 
(l.e. ,  "s"  followed  by  "lit"),  rather  than  "split."  Since  the  "p"  percepts  in 
such  stimuli  are  sometimes  not  very  convincing,  a  replication  of  the  Dorman  et 

al.  study  seemed  advisable,  to  verify  that  their  subjects'  "p"  percepts  were 

not  just  phantoms. 

The  long  optimal  closure  duration  (300  ms)  in  the  Dorman  et  al.  experi¬ 
ment  may  have  been  due  to  perceptual  compensation  for  the  absence  of  other 
cues  to  stop  manner.  However,  there  is  also  the  possibility  that  the  use  of  a 
wide  range  of  closure  durations  (0-650  ms),  combined  with  a  higher  relative 
frequency  of  short  intervals,  promoted  a  bias  toward  reporting  "split"  at 
atyplcally  long  closure  durations.  Therefore,  two  different  stimulus  ranges 
were  employed  here  to  assess  the  effect  of  this  variable  on  the  "3l"-"spl"  and 
"spl"-"s-l"  boundaries.  The  stimuli  in  this  part  of  the  experiment  (la)  began 
with  a  fricative  noise  that  contained  some  positive  stop  manner  cues  and  that 

was  also  used  in  Experiments  2-4.  To  approximate  the  conditions  of  the  Dorman 

et  al.  (1979)  study  even  more  closely,  the  test  employing  a  wide  range  of  clo¬ 
sure  durations  was  later  repeated  (1b)  using  a  fricative  noise  without 
positive  stop  manner  cues. 

Method 

Subjects.  Nineteen  paid  volunteers  served  as  subjects,  10  in  Experiment 
la  and  9  in  1b.  They  were  Yale  undergraduates  and  native  speakers  of  American 
English. 

Stimuli .  A  female  speaker  recorded  several  repetitions  of  the  utterance 
[splaet]  ("splat").  One  good  token  was  low-pass  filtered  (-3  dB  at  9.6  kHz, 
-55  dB  at  10  kHz)  and  digitized  at  a  20  kHz  sampling  rate.  Because  this 
speaker's  fricative  noises  contained  significant  energy  at  frequencies  above 
10  kHz,  which  caused  some  digitization  artifacts,  digitization  and  subsequent 
recording  of  audio  tapes  were  done  at  half  speed.  The  [s]  noise  was  125  ms 
long.  The  silent  closure  Interval  and  the  initial  11.5  ms  of  the  following 
stimulus  portion,  corresponding  to  the  labial  release  burst  (and  perhaps 
including  a  weak  first  glottal  pulse),  were  removed.  The  remaining  portion  in 
isolation  elicited  over  90  percent  "lat"  responses  (see  Exps.  2  and  3*  pre- 
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test).  Thus  it  did  not  seem  to  contain  any  sufficient  cues  to  a  preceding  la¬ 
bial  stop.  The  fricative  noise  from  [splaet],  however,  may  have  contained 
such  cues.  Therefore,  Experiment  1b  used  a  fricative  noise  derived  from  an 
utterance  of  [slaet]  produced  by  the  same  speaker,  190  ms  in  duration.^ 

Two  identification  tests  were  assembled  for  Experiment  la.  In  one,  the 
[s]  noise  was  followed  by  the  [laet]  portion  at  each  of  14  different  closure 

durations:  0,  20,  40,  60,  80,  100,  150,  200,  250,  300,  400,  500,  600,  and  700 

ms.  This  test  was  also  duplicated  in  Experiment  1b  with  the  different  [s] 

noise.  In  the  other  test  used  in  Experiment  1a,  only  the  9  closure  durations 

up  to  250  ms  were  included.  Each  test  contained  10  successive  randomizations 
of  the  stimuli,  with  interstimulus  intervals  (ISIs)  of  2.5  s  and  interblock 
intervals  of  6  s.  The  stimulus  sequences  were  recorded  at  half  speed  on  audio 
tape  using  high-quality  equipment,  with  closure  durations  and  ISIs  at  twice 
their  nominal  values;  thus  they  had  the  intended  values  at  playback  speed. 

Procedure.  The  subjects  listened  individually  or  in  small  groups  over 
TDH-39  earphones  in  a  quiet  room.  They  identified  each  stimulus  in  writing  as 
beginning  with  "si,"  "spl,"  or  "s-l"  (i.e.,  "s"  followed  by  silence  and 

"lat"). 

Results  and  Discussion 

The  average  percentage  of  stop  (i.e.,  "spl")  responses  is  plotted  in  Fig¬ 
ure  1  as  a  function  of  closure  duration  (on  a  logarithmic  scale).  Filled  and 
open  circles  represent  the  data  from  the  two  conditions  of  Experiment  la.  It 
is  evident  that  stimuli  with  short  closure  intervals  were  perceived  as  begin¬ 
ning  with  "si."  The  "si "-"spl"  boundary  fell  at  about  70  ms  of  closure  dura¬ 
tion.  "Spl"  responses  were  obtained  for  closure  intervals  ranging  from  60-300 
ms,  with  the  peak  occurring  at  100-150  ms  of  silence.  At  longer  closure  dura¬ 
tions,  an  increasing  number  of  "s-l"  responses  was  obtained.*  Truncation  of 
the  stimulus  range  did  not  affect  the  "sl"-"spl"  boundary  but  shortened  the 
"spl"-"s-l"  boundary  by  about  80  ms.  At  closure  intervals  of  200  and  250  ms 
combined,  there  were  significantly  fewer  "spl"  responses  in  the  narrow-range 
than  in  the  wide-range  condition  (one-way  repeated-measures  ANOVA:  F(1,9)  =■ 
26.25,  £  «  .0006).  The  "3pl"-"3-l"  distinction  is  not  very  categorical  and 
w-'^  expected  to  be  affected  by  stimulus  range.  The  fixed  "si "-"spl"  boundary, 
on  the  other  hand,  suggests  that  the  silence-cued  "p"  percepts  at  closure 
durations  below  150  ms  were  relatively  stable  and  insensitive  to  range  ef¬ 
fects. 

The  results  from  Experiment  1b  are  represented  by  the  triangles  in  Figure 
1.  They  confirm  that  the  fricative  noise  in  Experiment  la  contained  some 
positive  stop  manner  cues.  The  "si "-"spl"  boundary  was  at  a  longer  silent  in¬ 
terval  here  (close  to  100  ms),  the  maximum  of  "spl"  responses  was  less  pro¬ 
nounced  and  occurred  at  longer  silences  (150-250  ms),  and  the  subjects  experi¬ 
enced  more  uncertainty  at  the  longest  Intervals,  giving  more  "spl”  responses 
here  than  in  Experiment  la.  All  these  differences  are  at  least  in  part  due  to 
the  longer  duration  of  the  fricative  noise  used  in  Experiment  1b  (cf.  Repp, 
1984c),  but  spectral  differences  at  noise  offset  may  also  have  played  a  role. 

The  general  pattern  of  these  results  is  consistent  with  the  findings  of 
Dorman  et  al.  (1979).  That  is,  even  without  any  strong  stop  manner  cues  in 
the  surrounding  signal  portions,  "p"  percepts  are  obtained  in  a  certain  range 
of  closure  durations.  The  70  ms  boundary  separating  "si"  from  "spl"  responses 
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CLOSURE  DURATION  (rrs) 


Figure  1.  Percent  stop  (l.e. ,  "spl")  responses  as  a  function  of  closure  dura¬ 
tion  in  Experiments  la  (filled  and  open  circles)  and  1b  (trian¬ 
gles).  The  open  circles  represent  the  results  from  the  condition 
with  a  reduced  range  of  closure  durations. 


in  Experiment  la  is  very  close  to  that  obtained  by  Dorman  et  al.  The  results 
of  Experiment  1b  resemble  the  Dorman  et  al.  findings  in  terms  of  the  optimal 
closure  duration  for  hearing  "p";  they  suggest  that  listeners  need  exception¬ 
ally  long  closure  intervals  for  stop  perception  when  closure  duration  is  the 
sole  stop  manner  cue,  perhaps  to  compensate  for  the  absence  of  other  cues. 
The  optimal  closure  duration  in  Experiment  la,  however,  is  shorter  than  in 
than  in  the  Dorman  et  al.  study,  and  so  is  the  longest  closure  at  which  "p" 
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percepts  were  still  obtained.  These  results  are  somewhat  closer  to  reflecting 
the  typical  closure  durations  observed  in  natural  speech. 

Experiment  2 

Even  though  Experiment  1  demonstrated  the  perceptual  reality  of 
silence-cued  stop  consonants,  it  did  not  tell  us  how  obligatory  these  percepts 
are.  The  fact  that  the  percentage  of  "spl”  responses  did  not  reach  100  per¬ 
cent  at  any  closure  duration  suggests  a  certain  amount  of  ambiguity.  Subjects 
may  also  have  felt  compelled  to  apply  the  "spl"  response  category  supplied  by 
the  experimenter.  How  easy  would  it  be  to  convince  listeners  that  what  they 
are  hearing  is  really  "s"  followed  by  "lat,"  and  not  "splat"?  The  technique 
adopted  to  investigate  this  issue  in  the  following  experiments  was  to  con¬ 
struct  a  continuum  from  [plaet]  to  [laet],  to  prefix  it  with  an  [s]  noise  plus 
a  varying  silent  interval,  and  to  Instruct  listeners  either  to  identify  the 
whole  stimulus  ("integrative"  condition)  or  to  ignore  the  [s]  and  identify  on¬ 
ly  the  part  following  the  silence  ("analytic"  or  selective-attention  condi¬ 
tion).  Since  the  test  included  clear  [splaet]  (i.e. ,  [s]+silence+[plaet] ) 
stimuli,  there  was  no  pressure  to  give  any  stop  responses  to 
[s]+silence+[laet]  stimuli.  On  the  contrary,  contrast  among  stimuli  in  the 
test  should  reduce  any  such  tendencies.  The  analytic  instructions  were  rein¬ 
forced  by  the  use  of  the  response  "b"  (actually,  "bl")  for  the  syllable-ini¬ 
tial  labial  stop,  if  one  was  perceived,  as  contrasted  with  "p"  (actually, 
"spl")  in  the  integrative  condition. “  Note  that  the  analytic  int^ ructi ons  re¬ 
quired  a  perceptual  reinterpretation  within  the  linguistic  dona  in,  without 
leaving  the  speech  mode  (although  thinking  of  the  [s]  as  some  extraneous  noise 
might  help).  If  the  instructions  were  effective,  fewer  stop  responses  should 
be  obtained  in  the  analytic  than  in  the  integrative  condition  at  closure  dura¬ 
tions  beyond  100  ms,  particularly  for  those  stimuli  whose  final  portion  was 
perceived  as  beginning  with  "1"  in  isolation. 

The  "stop  generation  effect"  discussed  so  far — the  introduction  of  a  stop 
percept  by  appropriate  amounts  of  silence  in  the  absence  of  any  other  suffi¬ 
cient  cues — may  be  contrasted  with  a  "stop  suppression  effect"  due  to  an  ab¬ 
sence  of  a  sufficient  interval  of  silence  in  the  presence  of  other  sufficient 
cues.  Thus,  earlier  observations  (e.g.,  Fitch  et  al.,  1980;  Mann  &  Repp, 
1980)  lead  to  the  expectation  that  stimuli  perceived  as  beginning  with  "bl"  in 
isolation  will  lead  to  "si"  responses  when  preceded  by  an  [s]  noise  with  lit¬ 
tle  or  no  silence  in  between.  If  this  stop  suppression  effect  reflected  the 
same  higher-level,  integrative  mechanisms  as  the  stop  generation  effect,  and 
if  analytic  listening  instructions  were  effective,  then  more  stop  responses 
should  be  obtained  in  the  analytic  than  in  the  integrative  condition  at  short 
closure  durations,  particularly  for  those  stimuli  whose  final  portion  was  per¬ 
ceived  as  beginning  with  "bl"  in  isolation. 

Thus,  the  strongest  prediction  for  Experiment  2  is  that  silent  closure 
duration  will  have  a  marked  effect  on  stop  perception  in  the  integrative 
listening  condition  but  no  effect  at  all  in  the  analytic  condition:  Stimuli 
should  be  labeled  as  if  there  were  no  preceding  [s].  However,  apart  from  the 
fact  that  it  is  more  realistic  to  expect  only  a  more  or  less  pronounced  tend¬ 
ency  in  the  predicted  direction,  the  stop  generation  and  suppression  effects 
may  well  be  differentially  sensitive  to  attentional  strategies.  The  stop 
suppression  effect,  which  results  from  signal  components  occurring  in  close 
succession,  is  much  more  likely  to  involve  auditory  interactions  (such  as  for¬ 
ward  masking)  than  the  stop  generation  effect,  which  results  from  components 
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that  are  more  widely  separated  in  time.  If  this  notion  is  correct,  then  the 
prediction  should  be  that  selective  attention  instructions,  if  effective,  will 
lead  to  a  reduction  of  stop  percepts  at  longer  silences  but  not  to  an  increase 
of  stop  percepts  at  short  silences. 

Method 


Subjects.  The  same  10  subjects  as  in  Experiment  la  participated. 

Stimuli .  A  continuum  from  [plaet]  to  [laet]  was  constructed  from  the 
source  utterance  used  in  Experiment  la,  [splaet].  The  original  11.5  ms  labial 
release  burst  was  truncated  by  0,  2,  4,  7.5,  or  11.5  ms,  yielding  five  stimuli 
intended  to  range  perceptually  from  "blat"  to  "lat"  in  the  absence  of  a 
preceding  [s].*  The  outpoints  were  placed  at  zero-crossings  in  the  digitized 
waveform.  A  brief  pretest  was  assembled  in  which  these  five  stimuli  (without 
any  preceding  [s])  occurred  10  times  in  random  sequence,  with  ISIs  of  2.5  s. 

Two  additional  identification  tests  were  assembled.  In  one,  designed  for 
integrative  listening,  each  stimulus  from  the  [plaet] -[laet]  continuum  was 
preceded  by  [s]  at  silent  intervals  of  0,  40,  80,  120,  and  160  ms,  for  a  total 
of  25  stimuli  that  were  recorded  10  times  in  random  sequence  with  ISIs  of  2.5 
3.  The  other  test,  designed  for  analytic  listening,  contained  10  random  se¬ 
quences  of  the  same  25  stimuli  plus  10  x  2  replications  of  the  5  stimuli  with¬ 
out  a  preceding  [s]  Interspersed  among  them,  resulting  in  10  35-item  blocks. 
The  "no-[s]"  stimuli  were  intended  to  remind  the  subjects  of  the  stimulus  por¬ 
tion  to  attend  to,  and  perhaps  to  facilitate  selective  attention. 

Procedure.  All  subjects  listened  first  to  the  tapes  of  Experiment  la. 
Subsequently,  in  the  same  session,  the  integrative  listening  test  was  present¬ 
ed.  As  in  Experiment  la,  the  task  was  to  label  the  stimuli  as  beginning  with 
"si"  or  "spl.”  The  pretest  followed,  with  instructions  to  label  the  stimuli 
as  beginning  with  "bl"  or  "1."  Finally,  the  analytic  listening  test  was 
presented,  in  which  the  labels  "bl"  and  "1"  were  again  to  be  used.  Subjects 
were  told  to  ignore  the  [s],  if  present,  to  the  best  of  their  ability.  They 
were  informed  about  the  structure  of  the  stimuli  and  about  the  perceptual  ef¬ 
fect  to  be  avoided. 

Results  and  Discussion 

The  [plaet] -[ laet]  continuum  was  perceived  as  intended.  In  the  pretest, 
the  average  percentages  of  "bl"  responses  to  the  5  stimuli  were  100,  100,  90, 
9,  and  3,  respectively.  (Note  the  listeners’  remarkable  sensitivity  to  the 
3.5  ms  release  burst  cutback  occurring  between  stimuli  3  and  4;  for  comparable 
results,  see  Repp,  1984b:  Exp.  1.)  The  same  no-[s]  stimuli  interspersed  in 
the  analytic  listening  test  received  99,  99,  92,  24,  and  20  percent  "bl"  re¬ 
sponses,  respectively.  Thus,  stimuli  4  and  5  were  sometimes  perceived  as 
beginning  with  "bl"  in  this  environment,  but  they  still  were  clearly  distin¬ 
guished  from  stimuli  1,  2,  and  3,  which  sufficed  for  the  purposes  of  this 
experiment. 

In  both  the  integrative  and  analytic  listening  conditions,  stimuli  with 
no  closure  silence  at  all  never  elicited  labial  stop  responses.  Clearly,  ana¬ 
lytic  listening  instructions  were  totally  ineffective  here — not  an  unexpected 
result.  Therefore,  those  data  were  excluded  from  further  analysis,  reducing 
the  number  of  closure  durations  to  4.  Figure  2  shows  the  percentages  of  labi- 
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al  stop  responses  in  the  two  listening  conditions  as  a  function  of  closure 
duration  and  of  stimulus  number  on  the  continuum.  The  responses  to  no-[ s] 
stimuli  in  the  analytic  test  are  plotted  on  the  far  right. 
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Figure  2.  Percent  stop  responses  in  the  integrative  and  analytic  conditions 
of  Experiment  2,  separately  for  the  five  stimuli  from  the 
Cplaet] -[  laet]  continuum.  Data  for  the  0  ms  closure  duration  are 
omitted. 


It  is  evident  that  the  response  patterns  in  the  integrative  and  analytic 
conditions  were  highly  similar.  A  repeated-measures  ANOVA  showed  the  expected 
significant  main  effects  of  closure  duration  and  stimulus  continuum,  and  also 
an  interaction  between  these  factors  (all  £'3  <  .0001),  but  no  significant 
main  effect  of  conditions.  The  conditions  by  closure  duration  interaction  was 
significant,  F(3,27)  -  5.^15,  £  <  .005,  due  to  a  slight  reduction  in  labial 

stop  percepts  at  the  shorter  closure  durations  in  the  analytic  condition  rela¬ 
tive  to  the  integrative  condition,  and  a  relative  increase  at  the  longest  clo¬ 
sure  duration,  where  perceptual  segregation  of  the  [s]  noise  from  the  rest  of 
the  stimulus  might  have  been  expected  to  be  relatively  easier.  This  pattern 
of  results  is  the  opposite  of  the  predicted  one.  Thus  there  is  no  evidence 
that  the  analytic  listening  instructions  had  the  desired  effect.  Instead  of 
selectively  attending  to  the  stimulus  portion  following  the  silence,  the  sub¬ 
jects  apparently  responded  by  parsing  off  the  "s"  and  changing  the  "p"  to  "b" 
in  their  phonological  (or  orthographic)  representation  of  the  whole  stimulus. 
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The  peak  rate  of  labial  stop  responses  to  stimuli  4  and  5  preceded  by  [s] 
(about  70  percent  at  120  ms  of  silence  in  both  conditions)  clearly  exceeded 
that  for  stimuli  4  and  5  in  isolation,  but  was  lower  than  that  in  Experiment 
la  (about  90  percent).  This  may  suggest  unstable  "p"  percepts,  but  the  re¬ 
sults  of  the  analytic  condition  do  not  bear  this  out.  That  is,  the  instabili¬ 
ty  was  only  in  the  choice  of  response  from  one  trial  to  the  next,  not  in  the 
percept  on  which  it  was  based. 

It  is  Interesting  to  note  that  stimuli  1,  2,  and  3,  which  tended  to  give 
very  similar  results  at  longer  closures  and  in  isolation  (probably  due  to  a 
ceiling  effect),  elicited  different  response  rates  at  the  40  ms  closure  dura¬ 
tion.  In  fact,  an  orderly  trading  relation  can  be  seen  between  stimulus  num¬ 
ber  (i.e.,  degree  of  release  burst  truncation)  and  silent  closure  duration,  as 
previously  demonstrated  by  Repp  (1984b,  Exp.  1)  for  alveolar  stops  in  the 
"say"-"3tay"  contrast.  The  ’*3l"-"3pl’*  boundary  (50  percent  intercept)  ranged 
from  approximately  30  ms  (stimulus  1,  extrapolated)  to  over  90  ms  of  silence 
(stimulus  5) — a  remarkable  range,  considering  that  the  release  burst  being 
truncated  was  only  11.5  ms  long.  A  lot  of  silence  was  needed  to  compensate 
for  the  loss  of  a  small  piece  of  plosive  noise. 

Experiment  3 

Experiment  2  suggests  that,  at  least  without  special  training,  subjects 
are  unable  to  dissociate  an  [s]  noise  perceptually  from  the  following  speech 
signal.  In  part,  this  may  have  been  due  to  the  relatively  short  silent 
intervals  used.  Experiment  3  examined  the  same  issue  at  longer  closure  dura¬ 
tions,  where  selective  attention  to  the  stimulus  portion  following  the  [s] 
might  be  facilitated  by  the  increased  temporal  separation  and  the  consequent 
reduction  of  any  potential  auditory  stimulus  interactions  across  the  silence. 
Experiment  3  used  only  an  analytic  listening  condition,  taking  the  integrative 
identification  data  of  Experiment  la  for  comparison.  Since  the  closure 
Intervals  used  were  all  in  the  range  beyond  the  stop  suppression  effect,  the 
expectation  was  that  stop  responses  would  be  reduced  relative  to  Experiment  la 
and  would  approximate  the  percentages  for  no-[s]  stimuli. 

Method 

Subjects.  Ten  paid  volunteers  participated,  four  of  whan  had  taken  part 
in  Experiments  1a  and  2. 

Stimuli.  The  test  sequence  contained  the  five  stimuli  from  the 
Cplaet]-[laet]  continuum  preceded  by  the  [s]  noise  at  silent  intervals  of  100, 
150,  200,  250,  300,  400,  and  500  ms.  The  resulting  35  stimuli  were  augmented 
by  4  repetitions  of  the  5  stimuli  without  preceding  [s],  and  all  5b  stimuli 
were  recorded  in  5  randomized  orders  with  ISIs  of  2.5  s.  The  pretest  of 
Experiment  2  (no-[s]  stimuli  only)  was  also  used. 

Procedure.  Six  of  the  subjects  first  listened  to  the  pretest,  labeling 
each  stimulus  as  beginning  with  "bl"  or  "1."  (The  four  remaining  subjects  had 
received  the  pretest  in  an  earlier  session  in  connection  with  Experiment  2.) 
Following  the  pretest,  all  subjects  went  through  Experiment  4  (described  be¬ 
low)  before  embarking  on  Experiment  3.  The  Instructions  were  to  Ignore  the 
initial  [s],  if  present,  and  to  label  each  stimulus  as  beginning  with  either 
"bl"  or  "1."  The  subjects  were  informed  about  the  purpose  of  the  experiment 
and  about  the  nature  of  the  stimuli. 
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Results  and  Discussion 

The  average  percentages  of  labial  stop  responses  to  the  five  stimuli  in 
the  pretest  were  100,  100,  89,  16,  and  10,  respectively.  For  the  same  stimuli 
in  the  analytic  identification  test,  subjects'  average  percentages  were  99, 
99,  78,  13,  and  8.  Unlike  Experiment  2,  there  was  no  Increase  in  "bl"  re¬ 
sponses  to  stimuli  and  5  in  the  environment  of  stimuli  with  initial  [s], 
perhaps  because  there  were  no  contextual  stimuli  that  sounded  like  "slat." 

Figure  3  plots  "bl"  responses  to  stimuli  preceded  by  [s]  as  a  function  of 
silent  closure  duration.  The  response  percentages  for  the  interspersed  no-Cs] 
stimuli  are  plotted  on  the  far  right.  Several  patterns  are  evident  in  the  re¬ 
sults:  (1)  Stimuli  1,  2,  and  3  elicited  fewer  stop  responses  when  preceded  by 
[s]  than  when  presented  in  isolation.  (2)  At  closure  durations  shorter  than 
300  ms,  stimuli  and  5  elicited  more  stop  responses  when  preceded  by  [s]  than 
when  presented  in  isolation.  (3)  The  percentage  of  stop  responses  increased 
as  closure  duration  decreased,  reaching  a  peak  at  150  ms  for  stimuli  3,  4,  and 
5.  Responses  to  stimuli  1  and  2,  on  the  other  hand,  were  not  sensitive  to 
changes  in  closure  duration.  In  the  analysis  of  variance,  this  was  reflected 
in  a  significant  closure  duration  by  stimulus  number  interaction,  £(2*1,216)  » 

2.09,  £  <  .005. 


Figure  3.  Percent  stop  responses  in  the  analytic  task  that  constituted 
Experiment  3. 
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The  main  result  of  this  study  is  the  increase  in  stop  responses  when 
[laet]-llke  stimuli  were  preceded  by  [s]  at  closure  durations  of  less  than  300 
ms.  This  increase  resembles  the  results  of  Experiment  la,  obtained  with  stim¬ 
ulus  5  in  a  standard  (integrative)  labeling  task.  Thus,  as  in  Experiment  2, 
subjects  were  not  able  to  get  rid  of  atop  percepts  by  ignoring  the  [s]  precur¬ 
sor  and  focusing  their  attention  on  the  onset  of  the  stimulus  portion  follow¬ 
ing  the  closure  silence.  Some  measure  of  success  in  the  selective-attention 
task  is  indicated,  perhaps,  by  the  fact  that  stop  responses  to  stimulus  5 
preceded  by  [s]  reached  a  maximum  of  only  50  percent,  whereas  the  same  stimu¬ 
lus  elicited  as  much  as  90  percent  stop  responses  in  Experiment  la.  However, 
in  the  integrative  condition  of  Experiment  2,  there  was  also  a  relatively  low 
percentage  of  stop  responses  to  stimulus  5  at  comparable  closure  durations 
(about  60  percent).  Moreover,  since  subjects  had  been  told  that  a  preceding 
[s]  tended  to  generate  labial  stop  percepts  that  were  to  be  avoided,  a  bias 
against  responding  "bl"  may  have  operated.  This  is  strongly  suggested  by  the 
lowered  rate  of  "bl"  responses  (around  80  percent)  to  stimuli  1  and  2  preceded 
by  Cs],  which  certainly  would  have  been  labeled  "spl”  100  percent  of  the  time 
in  an  integrative  task.  Thus,  the  effect  of  the  selective-attention  instruc¬ 
tions  on  perceptual  organization  may  actually  have  been  rather  small  (see 
discussion  of  Figure  5  below). 

This  conclusion  must  be  qualified  immediately,  however,  because  closer 
inspection  of  the  data  revealed  considerable  individual  differences  (in  con¬ 
trast  to  Experiment  2).  In  particular,  there  were  2  (out  of  10)  subjects  who 
appeared  to  be  totally  successful  in  ignoring  the  [s]  precursor,  whose  label¬ 
ing  responses  were  not  influenced  by  closure  duration,  euid  who  exhibited  no 
response  bias.*  Four  or  five  other  subjects  showed  patterns  of  which  Figure  3 
is  representative,  and  the  remaining  subjects  exhibited  idiosyncratic  patterns 
and  showed  large  response  biases  against  "bl."  These  individual  differences 
are  reminiscent  of  those  observed  by  Repp  (1981)  in  a  study  that  required 
listeners  to  dissociate  a  fricative  noise  perceptually  from  a  following  vocal¬ 
ic  portion.  The  success  of  two  subjects  in  the  present  study  suggests  that 
analytic  listening  to  speech  components  is  not  an  impossible  task,  at  least 
not  when  the  closure  durations  are  fairly  long.  These  observations  are  con¬ 
sistent  with  the  hypothesis  that  silence- induced  stop  percepts  are  products  of 
a  higher-level  integrative  process,  and  not  of  psychoacoustic  interactions 
among  stimulus  components.  Nevertheless,  the  fact  remains  that  the  perceptual 
strategy  for  performing  the  selective  attention  task  was  not  available  to  most 
listeners,  even  though  they  had  received  a  moderate  amount  of  training  by  per¬ 
forming  the  low-uncertainty  task  of  Experiment  M  before  Experiment  3* 

Experiment  M 

Experiments  2  and  3  have  provided  only  very  limited  evidence  that  sub¬ 
jects  can  perceptually  dissociate  the  two  stimulus  components,  even  at  rela¬ 
tively  long  temporal  separations.  In  part,  subjects'  difficulties  in  carrying 
out  the  selective-attention  instructions  may  reflect  ingrained  habits  of  inte¬ 
grative  phonetic  processing  when  listening  to  speech.  At  very  short  temporal 
separations,  however,  psychoacoustic  interactions  among  the  stimulus  compo¬ 
nents  may  come  into  play,  and  these  interactions  may  be  truly  impossible  to 
disengage  by  acts  of  selective  attention  or  other  perceptual  strategies.  To 
investigate  this  issue  further.  Experiment  4  enflJloyed  a  low-uncertainty  para¬ 
digm  to  test  subjects'  ability  to  distinguish  between  clear  instances  of 
[plaet]  and  [last]  when  preceded  by  [s]  at  various  fixed  intervals  of  silence. 
It  was  expected  that  a  reduction  in  stimulus  uncertainty  would  facilitate  the 
selective  attention  task. 
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Method 

Subjects.  The  same  10  subjects  as  in  Experiment  3  participated. 

Stimuli .  Only  stimuli  1  and  5  from  the  [plaet]-[laet]  continuum  were 
used,  as  well  as  the  [s]  noise  derived  from  the  natural  [splaet].  Seven  stim¬ 
ulus  sequences  were  recorded,  each  containing  20  repetitions  of  stimuli  1  and 
5  in  random  order,  with  ISIs  of  2  s.  In  the  first  sequence,  there  was  no 
preceding  [s]  noise.  In  the  subsequent  sequences,  each  stimulus  was  preceded 
by  [s]  at  a  fixed  silent  interval.  Over  these  six  sequences,  the  closure  in¬ 
terval  decreased  from  500  to  200,  100,  50,  20,  and  finally  0  ms. 

Procedure.  The  subjects  were  told  that,  in  each  block  of  *10  stimuli, 
half  were  "blat"  and  half  were  "lat."  They  were  asked  to  label  each  stimulus 
as  beginning  with  "bl"  or  "1,”  guessing  if  necessary,  and  to  ignore  the  [s] 
precursors.  Note  that  Experiment  *1  preceded  Experiment  3. 

Results  and  Discussion 

Figure  4  shows  the  effect  of  [s]  precursors  at  various  closure  durations, 
with  the  no-[s]  stimuli  on  the  far  right.  Labeling  of  the  two  stimuli  without 
the  [s]  precursor  was  virtually  perfect.  Reading  the  graph  from  right  to 
left,  it  can  be  seen  that  discrimination  of  stimuli  1  and  5  (in  terms  of  the 
difference  in  "bl"  responses)  was  unaffected  at  the  500-ms  interval,  then  de¬ 
creased  but  stayed  fairly  high  up  to  the  50  ms  separation;  then  it  declined 
rapidly  and  reached  chance  at  0  ms  (51.5  percent  correct  responses  in  terms  of 
identification  of  stimulus  1  as  "bl"  and  of  stimulus  5  as  "1").  Although  the 
subjects  had  been  encouraged  to  guess  even  if  all  stimuli  sounded  like  "lat," 
few  followed  these  instructions.  The  low  percentage  of  stop  responses  at  the 
shortest  closure  durations  reflects  the  fact  that  [s]  +  [plaet]  sounds  like 
"slat"  when  there  is  no  closure  silence.^ 

Individual  differences  were  evident  in  this  task  also.  Three  subjects, 
including  the  two  who  stood  out  in  Experiment  3,  performed  almost  perfectly 
down  to  20  ms  of  silence,  where  they  suddenly  gave  only  "1"  responses  and  thus 
performed  at  chance  level.  The  other  subjects  were  more  error-prone  at  silent 
intervals  of  50-200  ms,  and  one  subject  seemed  to  reverse  the  response  cate¬ 
gories. 

To  determine  how  subjects'  performance  in  the  low-uncertainty  task  of 
Experiment  compared  with  the  performance  obtained  in  Experiments  2  and  3,  d’ 
values  for  the  stimulus  1  vs.  stimulus  5  discrimination  (treating  the  binary 
category  labels  as  if  they  were  "yes"  and  "no"  responses  in  a  signal  detection 
task)  were  computed  from  the  overall  response  percentages — a  rough  measure 
that,  however,  is  adequate  for  an  informal  graphic  comparison.*  These  d'  val¬ 
ues  are  plotted  in  Figure  5.  The  figure  suggests  that  discrimination  was  more 
accurate  in  Exp.  4  than  in  Exp.  3»  presumably  due  to  the  paradigm  that  reduced 
stimulus  uncertainty  and  thus  facilitated  selective  attention.  It  also  seems, 
however,  that  at  silent  Intervals  In  the  range  of  40-100  ms,  there  was  no 
difference  in  accuracy  between  Exps.  2  and  4.  (It  is  also  clear  that  there 
was  no  difference  between  the  Integrative  and  analytic  conditions  in  Exp.  2.) 
Since  performance  in  Exps.  2  and  3  matched  at  intervals  of  100-160  ms,  there 
is  no  reason  to  assume  that  the  subjects  in  Exp.  2  were  especially  accurate. 


PERCENT  STOP  RESPONSES 
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Rather,  it  seems  that  the  procedure  of  Exp.  though  it  was  beneficial  at 
longer  closure  durations,  conferred  no  advantage  in  the  vicinity  of  the 
"sl"-"spl"  category  boundary  (between  HO-IOO  ms  of  silence;  see  Fig.  1). 


This  observation,  together  with  subjects'  extremely  poor  performance  at 
very  short  closure  durations,  is  compatible  with  the  hypothesis  that  the  stop 
suppression  effect,  and  with  it  the  "sl"-"spl"  category  distinction,  rests  on 
a  psychoacoustic  interaction  that  cannot  be  disengaged  through  selective 
attention.  The  silence-cued  "p"  percepts  (the  stop  generation  effect)  at 
intervals  beyond  100  ms,  on  the  other  hand,  are  sensitive,  to  some  extent,  to 
listeners'  strategies  and  thus  may  represent  a  higher- level  integrative  proc¬ 
ess  peculiar  to  phonetic  perception.  The  comparisons  in  Figure  5  suggest, 
furthermore,  that  discriminative  sensitivity  is  heightened  in  the  category 
boundary  region,  whereas  discrimination  at  silent  intervals  characteristic  of 
strong  "p"  percepts  (i.e. ,  with  in-category  discrimination)  is  less  accurate 
and  requires  the  overcoming  of  Integrative  phonetic  processing  strategies. 
This  pattern  of  results  is  similar  to  that  obtained  in  many  studies  of 
categorical  perception  (see  Repp,  1984a). 


Experiment  5 


The  hypothesis  that  the  "si "-"spl"  boundary — more  specifically,  the 
suppression  of  a  stop  percept  at  short  closure  durations — has  a  psychoacoustic 
origin,  although  consistent  with  the  data  so  far,  is  contradicted  by  a  recent 

study  of  Pastore,  Szczesiul,  and  Rosenblum  (1984).  These  researchers  employed 

binaural  phase  shifts  to  differentially  lateralize  the  [s]  and  [pllt]  compo¬ 
nents  of  their  "3lit"-"spllt"  stimuli.  This  manipulation  left  the  category 
boundary  (located  at  68  ms  of  closure  silence  in  their  study)  completely 

unaffected.  The  authors  argued  that  differential  lateralization  should  reduce 
psychoacoustic  interactions  between  the  stimulus  components  and  that,  there¬ 
fore,  the  absence  of  an  effect  suggests  that  the  ”sl"-"spl”  boundary  does  not 
rest  on  a  psychoacoustic  criterion.  However,  apart  from  the  possibility  that 
the  phase  shift  technique  was  too  weak  a  manipulation  to  remove  psychoacoustic 
Interactions,  these  results  do  not  rule  out  such  interactions  at  closure 

intervals  shorter  than  the  boundary  value. 


An  additional  experiment  probing  the  possible  psychoacoustic  basis  of 
silence-cued  stop  consonant  perception  is  also  necessitated  by  the  fact  that 
Experiments  2-4  were  relatively  unsuccessful  in  disengaging  subjects'  integra¬ 
tive  processing  strategies.  The  evidence  for  a  higher-level,  speech- spec  if ic 
basis  for  the  stop  generation  effect  is  suggestive  at  best,  and  a  demonstra¬ 
tion  that  psychoacoustic  interactions  are  not  involved  would  strengthen  the 
argument  considerably. 


For  the  present  stimuli,  in  which  the  difference  between  [plaet]  and 
[laet]  rests  entirely  on  a  brief  release  burst,  the  most  obvious  psychoacous¬ 
tic  hypothesis  is  that,  at  short  temporal  separations,  the  burst  suffers  from 
forward  masking  by  the  preceding  fricative  noise,  and  therefore  becomes  diffi¬ 
cult  to  detect.  If  so,  then  this  masking  effect  should  occur  also  when  a 
burst  of  white  noise  is  substituted  for  the  [s]  frication,  provided  that  the 
energy  of  the  white  noise  is  not  substantially  below  that  of  the  frication. 
From  the  viewpoint  of  phonetic  perception,  however,  the  white  noise  is  less 
speech-like  and  therefore  should  be  more  easily  filtered  out  in  a 
selective-attention  task.  If  the  "sl"-"spl"  boundary  does  not  rest  on  a 


123 


•*. 

■  V 

•-  ■-  %  '.  • 

‘  v  V 


V.  ' 


t 


•Sv:< 


fci  •  .  •  •  ■  I 


r.  •  -  •  • 


5:^ 


Repp:  Perceptual  Coherence  of  Speech 


psychoacoustic  interaction,  subjects  should  be  more  successful  in  identifying 
"blat"  and  "lat"  when  white  noise  replaces  the  [s]  precursor. 

Method 


Subjects.  The  same  9  subjects  as  in  Experiment  1b  participated. 

Stimuli .  The  five  stimuli  from  the  [plaet]-[laet]  continuum  were  again 
used.  Instead  of  a  natural  [s]  noise,  however,  a  burst  of  white  noise  was 
used  as  a  precursor.  The  white  noise  was  recorded  from  a  General  Radio  1390-A 
random  noise  generator,  low-pass  filtered  and  digitized  at  half  speed  at  a  20 

kHz  sampling  rate.  It  differed  from  the  [s]  noise  used  previously  (Exp.  la 

and  Exps.  2-4)  in  three  respects:  (1)  Its  duration  was  200  ms,  versus  125  ms 
for  the  [s]  noise.  (2)  It  was  gated  on  and  off  abruptly,  whereas  the  [s] 

noise  had  gradual  on-  and  offsets.  (3)  It  had  a  flat  spectrum,  whereas  the 
spectrum  of  the  [s]  noise  had  a  pronounced  peak  at  about  8.6  kHz,  which 

projected  by  about  20  dB  above  a  relative  energy  plateau  ranging  from  4  to  10 
kHz.  The  spectral  energy  of  the  white  noise  matched  that  of  the  plateau;  its 
energy  was  higher  than  that  of  the  [s]  noise  below  4  kHz  and  above  10  kHz,  and 
lower  between  about  8-9  kHz.  Its  energy  at  offset  was  considerably  higher 
than  that  of  the  fricative  noise  across  the  whole  spectrum.  All  these  differ¬ 
ences  led  to  the  expectation  that  the  white  noise  would  have  a  more  pronounced 
forward  masking  effect  than  the  [s]  noise,  if  such  a  psychoacoustic  effect  is 
Involved  at  all.  On  the  other  hand,  relatively  long  duration,  abrupt  offset, 

and  flat  spectrum  are  all  uncharacteristic  of  natural  fricative  noises  preced¬ 

ing  a  stop  closure.* 

The  stimulus  tape  matched  that  of  the  analytic  condition  in  Experiment  2. 
That  is,  silent  Intervals  ranged  from  0  to  160  ms,  and  "no-noise"  stimuli  were 
interspersed. 

Procedure.  All  subjects  listened  first  to  the  tape  of  Experiment  1b  (an 
integrative  labeling  task)  and  then  to  the  pretest,  as  used  in  Experiments  2 
and  3  (stimuli  without  preceding  noise).  Instructions  for  the  main  test  were 
the  same  as  in  the  analytic  condition  of  Experiment  2:  Ignore  the  noise  and 
label  the  stimuli  as  beginning  with  "bl"  or  "1." 

Results  and  Discussion 

Figure  6  shows  the  results,  which  are  strikingly  different  from  those  of 
Experiment  2  (cf.  Fig.  2,  right-hand  panel).  Over  the  range  from  40-160  ms  of 

silence,  the  white  noise  precursor  had  no  effect  at  all  on  subjects'  ability 

to  Identify  the  stimuli  from  the  [plaet]-[  laet]  continuum,  except  for 
introducing  a  slight  bias  against  stop  responses.*®  In  particular,  the  white 
noise  did  not  induce  any  stop  percepts  when  it  preceded  stimuli  4  and  5.  Only 
when  there  was  no  silent  interval  between  the  noise  and  the  speech  did  the 
noise  exert  a  perceptual  effect,  rendering  stimuli  2-5  indiscriminable,  while 
stimulus  1  continued  to  receive  a  higher  rate  of  stop  responses.  Note  also 
that,  in  this  condition,  subjects  were  equally  willing  to  respond  "bl"  or  "1," 
whereas  in  the  corresponding  condition  of  Experiment  2  (not  shown  in  Fig.  2) 
responses  were  exclusively  "1."  This  suggests  that  the  subjects  in  Experiment 
5  considered  the  white  noise  as  an  extraneous  signal  that  might  obscure  stop 
consonant  cues  present  in  the  speech  signal,  whereas  the  subjects  in  Experi¬ 
ment  2  perceived  the  [s]  noise  as  part  of  the  utterance,  even  when  asked  not 
to  do  so,  and  thus  were  unwilling  to  consider  the  possibility  of  an  inaudible 
stop  consonant. 
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SJLENCE  DURATION  (ms) 


Figure  6.  Percent  stop  responses  in  Experiment  5. 


It  seems  extremely  unlikely  that  spectral  or  other  properties  of  the 
white  noise  were  responsible  for  its  reduced  masking  power,  since  it  was  a 
more  powerful  signal  than  the  [s]  noise  by  most  acoustic  criteria.  Although 
the  [s]  noise  was  more  intense  between  R  and  9  kHz,  the  spectral  peaks  of  the 
labial  release  burst  were  in  a  region  (below  H.5  kHz)  where  the  white  noise 
exceeded  the  [s]  noise  in  energy.  Therefore,  the  results  suggest  that 
psychoacoustic  interference  (i.e. ,  forward  masking)  was  involved  only  at  the 
very  shortest  closure  intervals  (less  than  iJO  ms).  Consequently,  the  reduc¬ 
tion  in  stop  responses  when  an  [s]  noise  precedes  [plaet]  stimuli  by  iJO-80  ms 
(see  Fig.  2)  probably  does  not  represent  psychoacoustic  interference,  but 
rather  a  specifically  phonetic  effect  reflecting  the  listener's  tacit  knowl¬ 
edge  about  the  minimal  permissible  duration  of  stop  consonant  closures  in  this 
context.  Apparently,  listeners  are  compelled  to  apply  this  knowledge  as  long 
as  they  perceive  a  coherent  stream  of  speech.  This  conclusion  is  consistent 
with  that  reached  by  Pastore  et  al.  (198^),  and  it  suggests  that  the  two  ef¬ 
fects  of  closure  silence  (stop  suppression  at  short  durations,  stop  generation 
at  longer  durations)  can  be  accounted  for  within  a  single  theoretical  frame¬ 
work,  that  of  perception  in  the  "speech  mode"  (Liberman,  1982;  Repp,  1982). 
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Summary  and  Conclusions 

The  present  series  of  studies  addressed  the  question  of  the  origin  of  the 
auditory  coherence  of  speech  by  focusing  on  one  particularly  striking  phenome¬ 
non —  that  of  silence-cued  labial  stop  consonants  in  fricative-liquid  context. 
This  phenomenon  illustrates  both  the  coherence  of  acoustically  heterogeneous 
speech  components  in  general  and  the  perceptual  integration  of  disparate  cues 
to  the  perception  of  a  particular  phonetic  contrast.  Between  the  fricative 
noise  and  the  resonances  resulting  from  production  of  the  liquid  consonant, 
there  is  an  abrupt  change  in  the  nature  and  location  of  the  sound  source  (from 
voiceless  and  dental  to  voiced  and  laryngeal)  and  in  spectral  composition 
(from  higher  to  lower  frequencies).  Nevertheless,  with  or  without  an 
Intervening  brief  silent  interval,  listeners  usually  perceive  both  sounds  as 
part  of  a  coherent  speech  stream.  This  coherence  in  turn  gives  rise  to  a  stop 
consonant  percept  when  a  silent  Interval  of  appropriate  duration  (roughly, 
80-200  ms)  is  present.  Thus  the  silence  itself  becomes  part  of  the  speech 
stream:  rather  than  interrupting  the  continuity  and  contributing  to  the 
perceptual  segregation  of  acoustically  disparate  signal  components,  the 
silence  functions  as  a  carrier  of  phonetic  information.  Only  when  the  silence 
duration  clearly  exceeds  the  acceptable  limits  of  a  stop  consonant  closure 
does  it  lead  to  perceptual  segregation  of  the  signal  components. 

It  was  hypothesized  that  the  integrative  function  that  gives  rise  to 
these  phenomena  is  a  characteristic  of  perception  in  the  speech  mode — that  is, 
of  perceiving  the  information  that  is  most  useful  for  linguistic  communica¬ 
tion.  One  way  of  testing  this  hypothesis  would  be  to  lead  listeners  to  per¬ 
ceive  the  same  stimuli  as  either  speech  or  nonspeech.  Some  evidence  favoring 
the  hypothesis  has  already  been  obtained  using  variants  of  that  method  (Best 
et  al.,  1981;  Repp,  1981).  A  somewhat  different  approach  was  taken  here.  It 
was  argued  that,  if  perceptual  integration  of  the  form  studied  here  is  a 
speech-specific  function,  it  might  be  possible  to  influence  its  operation  by 
directly  manipulating  the  listeners'  interpretation  of  the  speech  stimulus, 
staying  entirely  within  the  speech  mode.  The  success  of  this  approach  was  not 
guaranteed,  of  course,  since  manipulation  of  listeners'  strategies  through 
instructions  may  simply  be  ineffective.  In  the  absence  of  a  convincing 
psychoacoustic  explanation  for  the  perceptual  integration  of  speech  compo¬ 
nents,  however,  negative  findings  may  tell  us  that  certain  perceptual  strate¬ 
gies  are  not  easily  modified  or  abandoned — not  that  they  are  not 
speech-spec  if ic. 

In  a  series  of  experiments  (Exps.  2-5)  following  a  basic  demonstration  of 
silence-cued  stop  consonants  (Exp.  1),  it  was  attempted  to  alter  subjects' 
interpretation  of  the  stimulus  by  instructing  them  to  mentally  separate  the 
fricative  noise  from  the  following  signal  portion.  The  relative  ineffective¬ 
ness  of  the  selective-attention  instructions  with  stimuli  of  seemingly  minimal 
acoustic  coherence  is  interpreted  as  evidence  for  the  relative  stability  of 
the  perceptual  integration  function.  Experiment  3  indicated,  however,  that 
some  subjects  can  be  successful  in  this  task,  and  Experiment  4  showed  that  a 
low-uncertainty  paradigm  also  facilitates  selective  attention.  These  results 
parallel  those  obtained  in  studies  of  categorical  perception  (see  Repp,  1984a, 
for  a  review),  where  subjects  frequently  need  to  disengage  or  Ignore  another 
basic  function  of  the  speech  mode,  that  of  phonetic  classification,  in  order 
to  discriminate  speech  stimuli.  In  these  studies,  it  seems  that  success  in 
with  in-category  discrimination  often  requires  perceptual  strategies  that 
operate  outside  the  speech  mode.  The  present  task,  too,  could  in  principle 
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have  been  accomplished  by  listening  specifically  for  the  release  burst,  though 
there  was  no  evidence  that  the  subjects  used  this  "auditory"  strategy.  Rath¬ 
er,  the  few  successful  subjects  appeared  to  be  able  to  do  what  the  instruc¬ 
tions  asked  for:  to  ignore  the  fricative  noise  and  listen  to  the  remainder  of 
the  stimulus  as  speech — a  skill  that  trained  phoneticians  presumably  would 
have  in  their  repertoire. 

One  way  of  ignoring  a  fricative  noise  is  to  think  of  it  as  a  nonspeech 
hiss  arising  from  a  source  outside  the  speaker's  vocal  tract.  That  this 
strategy  could  be  effective  is  clear  from  Experiment  5  which,  by  substituting 
a  nonspeech  noise  for  the  frlcatlon,  actually  created  the  situation  that  sub¬ 
jects  otherwise  might  try  to  imagine.  The  ease  with  which  the  subjects  car¬ 
ried  out  the  selective-attention  instructions  in  this  situation  argues  against 
a  psychoacoustic  account  of  perceptual  integration  and  of  the  effect  of  the 
silent  Interval  on  stop  consonant  perception.  This  latter  effect  has  two  as¬ 
pects,  which  were  termed  "stop  suppression"  (short  intervals)  and  "stop 
generation"  (longer  intervals).  On  the  basis  of  the  results  of  Experiment  5 
it  was  concluded  that  both  of  these  effects  are  likely  reflections  of 
speech-specific  perceptual  criteria,  with  only  the  suppression  effect  at 
extremely  short  closure  silences  having  a  psychoacoustic  origin.*' 


In  conclusion,  then,  the  results  of  the  present  experiments  are  consist¬ 
ent  with  a  theoretical  view  of  speech  perception  that  postulates  a  number  of 
specific — though  not  necessarily  unique — functions.  These  perceptual  func¬ 
tions,  which  include  the  perceptual  integration  of  speech  components,  are  as¬ 
sumed  to  be  driven  by  an  internal  representation  of  the  regularities  of  spoken 
language.  How  this  representation  should  be  characterized  and  how  it  is  ac¬ 
quired  are  fundamental  questions  for  future  research. 
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Footnotes 

‘The  question  posed  here  is  similar  in  many  ways  to  that  underlying 
categorical  perception  research  (see  Repp,  1984a),  but  the  methodology  is  dif¬ 
ferent.  Categorical  perception  experiments  examine  subjects'  ability  to 
discriminate  stimulus  differences  within  phonetic  categories;  here,  the  focus 
is  on  listeners'  ability  to  Ignore  one  part  of  a  stimulus  (a  skill  that  may 
play  a  role  in  some  discrimination  tasks).  Both  tasks  are  difficult  because 
listeners  tend  to  adhere  to  their  habitual  mode  of  phonetic  perception,  which 
is  categorical  and  Integrative.  No  claim  is  made  here  that  this  type  of 
perceptual  mode  is  specific  to  speech;  It  is  called  "phonetic"  only  because 
the  stimuli  happen  to  be  speech.  That  being  so,  however,  many  specific  in¬ 
stances  of  perceptual  integration  may  Indeed  be  speech-specific,  simply  be¬ 
cause  they  have  no  parallels  in  other  dcniains  of  experience. 
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^To  be  sure,  the  [sj  noise  must  not  be  too  long,  and  its  offset  and  the 
[Ij  onset  not  too  gradual:  otherwise,  no  stop  percepts  will  be  obtained.  The 
presence  of  stop  manner  cues  in  the  [s]  noise  was  irrelevant  in  Experiments 
2-4,  because  subjects'  attention  was  directed  toward  the  stimulus  portion 
following  the  silence.  As  far  as  that  portion  is  concerned,  it  was  sufficient 
that  it  not  elicit  any  stop  percepts  in  isolation.  No  claim  is  being  made 
that  either  signal  portion  contained  no  cues  whatsoever  to  stop  consonant 
perception  (see  also  Footnote  6). 

’Some  subjects,  especially  in  Experiment  la,  spontaneously  gave  "sP-1" 
responses,  indicating  that  they  detected  stop  manner  cues  in  the  frication, 
while  at  the  same  time  perceiving  a  gap  between  the  [s]  and  the  rest  of  the 
stimulus.  These  responses  were  treated  as  equivalent  to  "s-1";  thus  they  are 
not  included  in  the  "spl"  percentages  plotted  in  Figure  1. 

"The  phonetic  symbol  [p]  represents  a  voiceless  unasplrated  labial  stop 
consonant,  which  in  English  orthography  is  rendered  as  "p"  in  some  contexts 
(e.g.,  following  a  voiceless  fricative  in  the  same  syllable)  but  as  "b"  in 
others.  Throughout  this  paper,  phonetic  symbols  in  brackets  denote  stimuli  or 
the  speaker's  Intentions,  whereas  orthographic  symbols  in  quotes  refer  to  re¬ 
sponses  or  the  listeners'  percepts. 

®For  the  author  and  most  subjects,  excision  of  the  natural  labial  release 
burst  in  [plaet]  resulted  in  elimination  of  the  stop  percept.  Some  listeners, 
however,  still  claimed  to  hear  a  "b,"  which  may  reflect  a  special  sensitivity 
to  weak  coartlculatory  cues  in  the  [1]  portion.  These  coarticulatory  cues  may 
reside  in  spectral  or  amplitude  properties  of  the  signal  immediately  following 
the  release  burst  or,  perhaps  more  likely,  in  the  shorter  duration  of  the  [1] 
as  compared  to  one  articulated  in  absolute  utterance- initial  position.  One 
additional  subject  in  Experiment  2  and  two  additional  subjects  in  Experiment  3 
were  excluded  because  they  perceived  all  stimuli  from  the  [ plaet]-[ laet]  con¬ 
tinuum  as  "blat." 

®One  of  these  two  subjects  had  participated  in  Experiments  la  and  2.  In 
the  labeling  task  of  Experiment  la,  which  used  stimulus  5  of  the 
[plaet] -[ laet]  continuum,  she  gave  90  percent  stop  responses  at  closure  dura¬ 
tions  of  100  and  150  ms.  In  Experiment  2,  for  stimuli  4  and  5  with  120  and 
160  ms  of  silence,  she  gave  63  percent  stop  responses  in  the  integrative 
condition,  70  percent  in  the  analytic  condition,  and  0  percent  when  there  was 
no  preceding  [s]  noise.  In  Experiment  3,  however,  she  gave  not  a  single  stop 
response  to  the  same  stimuli  with  silent  intervals  of  100  and  150  ms.  Clear¬ 
ly,  she  had  discovered  an  effective  selective  attention  strategy  in  Experiment 
3,  perhaps  as  a  result  of  going  through  the  task  of  Experiment  4  (where  she 
likewise  did  not  give  any  stop  responses  in  the  comparable  stimulus  condi¬ 
tions)  . 

’It  might  be  noted  that  while  the  inexperienced  subjects  performed  at 
chance  level  in  the  0  ms  condition,  the  author  as  a  pilot  subject  obtained  a 
score  of  85  percent  correct.  Thus,  it  is  not  impossible  to  discriminate  the 
[plaet]  and  [laet]  components  in  this  condition,  but  a  different  perceptual 
strategy  seems  to  be  required  (viz.,  listening  for  a  certain  difference  in  au¬ 
ditory  quality  caused  by  the  presence  versus  absence  of  a  release  burst). 
Note  that  this  strategy  is  nonphonetic  in  character,  unlike  the  phonetic 
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dissociation  strategy  requested  by  the  analytic  listening  instructions.  In¬ 
deed,  those  few  subjects  who  seemed  to  be  successful  analytic  listeners  in 
Experiments  3  ard  4  still  failed  to  discriminate  the  stimuli  at  the  very 
shortest  closure  durations.  It  is  likely  that  nonphonetic  strategies  would  be 
fostered  by  extensive  training  with  feedback,  which  is  one  reason  why  this 
method  was  not  used  to  induce  analytic  phonetic  strategies. 

*The  d'  values  were  also  computed  for  individual  subjects  and  then  aver¬ 
aged.  The  results  were  not  substantially  different  from  the  global  d'  values 
shown  in  Figure  5.  Although  certain  distortions  in  the  global  values  may  have 
occurred  due  to  different  degrees  of  criterion  variability  in  different 
experiments,  the  individual  subjects’  values  are  even  more  distorted  because 
of  the  many  occurrences  of  response  percentages  of  0  and  100,  which  necessi¬ 
tate  setting  an  arbitrary  upper  limit  for  d*.  For  this  reason,  the  d'  values 
computed  from  the  average  response  percentages  were  preferred  for  this  inform¬ 
al  comparison  among  experiments. 

*0f  course,  the  white  noise  did  not  sound  like  a  fricative  noise  (at 
best,  it  sounded  remotely  [f]-like).  For  this  reason,  an  integrative  listen¬ 
ing  condition,  in  which  subjects  try  to  interpret  the  noise  as  a  fricative, 
was  not  considered.  The  point  here  is  that,  if  psychoacoustic  interactions 
are  involved,  they  should  not  depend  on  the  speechlikeness  of  the  noise. 

‘“This  tendency,  as  well  as  its  apparent  increase  with  closure  duration, 
was  due  to  two  subjects'  data  only. 

‘‘Another  possible  auditory  interaction  that  was  not  considered  seriously 
here,  but  that  may  warrant  some  further  investigation,  is  auditory  short-term 
adaptation  (see  Delgutte  &  Kiang,  1984).  The  [s]  precursor  should  adapt 
high-frequency  neurons  more  than  low-frequency  neurons,  so  that  the  auditory 
response  to  the  following  signal  portion  would  be  more  vigorous  in  the 
low-frequency  regions,  which  might  favor  labial  stop  percepts.  There  are  sev¬ 
eral  problems  with  that  hypothesis,  however;  (a)  The  long  temporal  range  of 
the  stop  generation  effect  (Exp.  1)  exceeds  the  range  of  auditory  adaptation, 
(b)  The  stop  suppression  effect  remains  unexplained,  (c)  The  ability  of  some 
subjects  to  disengage  the  stop  generation  effect  argues  against  peripheral  au¬ 
ditory  factors,  (d)  The  [s]  noise  spectrum  is  not  differentiated  enough  in 
the  low-frequency  region  to  substantially  alter  the  shape  of  the  "auditory 
spectrum"  at  the  onset  of  the  following  signal,  (e)  The  stop  generation  ef¬ 
fect  is  reduced  by  an  Increase  in  fricative  noise  duration  (Exp.  1b;  Repp, 
1984c). 
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DEVELOPMENT  OF  THE  SPEECH  PEhCEPTUOMOTOR  SYSTEM* 


I 

Michael  Studdert-Kennedyt 


Introduction 

The  intent  of  the  present  paper  is  to  reflect  on  the  development  of  the 
speech  perceptuomotor  system  in  light  of  the  infant's  evident  capacity  for  in- 
termodal  (or,  better,  amodal)  perception,  discussed  by  Meltzoff  and  by  Kuhl  in 
this  volume.  The  central  issue  is  imitation.  How  does  a  child  (or,  for  that 
matter,  an  adult)  transform  a  pattern  of  light  or  sound  into  a  pattern  of  mus¬ 
cular  controls  that  serves  to  reproduce  a  structure  functionally  equivalent  to 
the  model?  The  hypothesis  to  be  outlined  is  that  imitation  is  a  specialized 
mode  of  action,  in  which  the  structure  of  an  amodal  percept  directly  specifies 
the  structure  of  the  action  to  be  performed  (cf.  Meltzoff  &  Moore,  1983). 

The  General  Function  of  Perception 

Let  us  begin  by  considering  briefly  the  function  of  perception  from  an 
ethological  perspective  (Gibson,  1966,  1979;  von  UexkOll,  193^).  The  general 
function  of  perception  is  to  control  action.  Perception  and  action  are  two 
terms  in  a  functional  system  that  permits  an  animal  to  survive.  To  survive, 
an  animal  must  constantly  negotiate  a  physical  world,  moving  around,  over  or 
under  objects  in  its  path,  seeking  food  or  mates,  escaping  from  predators. 
The  actions  that  an  animal  takes,  its  coordinated  patterns  of  goal-seeking 
movements,  are  more  or  less  precisely  matched  to  the  world  it  perceives;  and 
the  world  it  perceives  is  constantly  modulated  by  the  actions  it  takes.  Thus, 
action  and  perception  are  mutually  entailed  components  of  a  single  system: 
each  fits  the  other  as  key  fits  lock. 

How  is  the  fit  achieved?  How  are  the  varying  patterns  of  light,  sound, 
temperature,  pressure  that  determine  perception  transduced  into  the  neuromus¬ 
cular  patterns  that  determine  action?  Can  we  find  a  single  set  of  descriptive 
terms  that  will  match  all  the  various  sensory  modalities  with  the  single 
modality  of  action?  We  may  approach  an  answer  to  these  questions  by  asking 
another:  What  information  do  light,  sound  and  other  modes  of  energy  convey? 
Following  Gibson  (1966,  1979)  we  answer  quite  generally:  Information  that 
specifies  the  structures  of  objects  and  events  to  which  action  must  adapt. 

We  may  note  two  properties  of  perceived  object-event  structures.  First, 
they  are  amodal.  We  perceive  a  desk,  say,  through  a  pattern  of  light  struc- 
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tured  by  its  light-reflecting  properties,  or  by  touch  through  the  pattern  of 
mechanical  resistance  it  offers  to  our  fingers.  A  bat,  being  equipped  with 
sonar,  might  perceive  the  desk  by  virtue  of  the  desk's  sound-reflecting  prop¬ 
erties.  Similarly,  we  normally  perceive  a  spoken  word  through  a  pattern  of 
sound,  structured  by  the  coordinated  articulations  of  a  speaker.  To  the  ex¬ 
tent  that  these  articulations  reflect  radiant  energy  within  the  visible  spec¬ 
trum,  we  may  also  perceive  the  word  by  virtue  of  its  optical  structure.  The 
deaf-blind,  using  the  Tadoma  method,  may  even  perceive  the  word  by  touch  (Nor¬ 
ton,  Schultz,  Reed,  Braida,  Durlach,  Rabinowitz,  &  Chomsky,  1977).  What  we 
perceive,  then,  are  objects  and  events,  independent,  in  principle,  of  the  sen¬ 
sory  modalities  through  which  we  perceive  them. 

The  second  point  to  note  about  object-event  structures  is  that  their  per¬ 
ceived  qualities  vary  with  the  perceiving  organism.  The  "same"  object  has 
different  utilities  for  different  animals,  or  for  the  same  animal  at  different 
times.  Objects  and  events  differ  in  what  von  UexkUll  (1934)  termed  their 
"functional  tones,"  what  Gibson  (1966)  termed  their  "affordances."  The  puddle 
that  a  person  steps  over  affords  a  dog  an  opportunity  to  drink;  the  desk  that 
offers  support  for  a  writing  pad  on  one  occasion  may  serve  as  a  seat  on  anoth¬ 
er;  a  word  spoken  in  Mandarin  is  merely  a  vocalization  to  someone  who  knows  no 
Chinese.  Thus,  different  animals  perceive  different  worlds  (von  UexkUll 's 
Umwelten),  each  structured  by  the  animal’s  potential  actions,  just  as  its  ac¬ 
tions  are  structured  by  its  perceived  world. 

The  Function  of  Speech  Perception 
The  Speech  percept  as  Amodal 

The  first  function  of  speech  perception  is  social  and  communicative,  a 
pragmatic  function  analogous  to  the  general  function  of  perception  discussed 
above.  As  the  carrier  of  language,  speech  offers  meaning,  that  is  to  say 
(very  broadly),  information  conveying  the  structure  of  a  social  world  within 
which  an  individual  may  act.  The  individual,  by  acting  in  response,  whether 
linguistically  or  non-linguistically,  then  modulates  the  perceived  structure 
of  her  social  world. 

A  second  function  of  speech  perception,  ontogenetically  prior  to  the 
first  and  of  more  immediate  interest  here,  is  in  language  acquisition.  While 
the  adult  may  listen  simply  for  meaning,  the  learning  child  must  listen  both 
for  meaning  and  for  information  specifying  a  talker’s  articulatory  gestures. 
This  second  perceptual  function  therefore  controls  action  in  the  more  limited 
sense  of  providing  a  model  for  Imitation. 

Before  we  consider  imitation,  let  us  explicate  and  Justify  the  claim  that 
speech  carries  information  specifying  a  talker’s  articulatory  gestures.  No¬ 
tice,  first,  that  this  is  not  the  customary  account.  For  example,  Abercrombie 
(1967)  characterizes  one  form  of  the  information  conveyed  by  speech  as 
linguistic  and  segmental,  intending  by  this  a  sequence  of  phonetic  elements, 
the  consonants  and  vowels  of  a  phonetic  transcription.  This  is  certainly  cor¬ 
rect,  at  one  level  of  description,  as  our  ability  to  read  and  write  alphabeti¬ 
cally  demonstrates.  However,  a  transcription  is  so  far  removed  from  the  sig¬ 
nal  that  most  people  in  the  world  who  can  speak  and  understand  speech  cannot 
read  or  write. 
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What,  then,  is  the  difference  between  the  information  in  a  spoken  utter¬ 
ance  and  the  information  in  its  written  counterpart?  Following  Carello,  Tur- 
vey,  Kugler,  and  Shaw  (198^)  (see  also  Turvey  &  Kugler,  198M),  we  may  say  that 
the  difference  is  between  information  that  specifies  and  information  that  in¬ 
dicates.  The  information  in  a  spoken  utterance  is  not  arbitrary:  its  acous¬ 
tic  structure  is  a  lawful  consequence  of  the  articulatory  gestures  that  shaped 
it.  In  other  words,  its  acoustic  structure  is  specific  to  those  gestures,  so 
that  a  human  listener  (who  knows  the  language  spoken)  has  no  difficulty  in 
following  the  specifications  and  organizing  her  own  articulations  to  reproduce 
the  utterance.  By  contrast,  the  form  of  a  written  transcription  is  an  arbi¬ 
trary  convention,  a  string  of  symbols  that  indicate  to  the  reader  what  she  is 
to  do,  but  do  not  specify  how  she  is  to  do  it.  The  important  point  here  is 
that  indicational  information  cannot  control  action  in  the  absence  of  informa¬ 
tion  specific  to  the  act  to  be  performed.  For  example,  a  road  sign  indicates 
that  we  are  to  stop,  but  we  can  only  follow  the  instruction  if  we  have  infor¬ 
mation  specifying  our  velocity  and  our  distance  from  the  required  stopping 
point  (Turvey  &  Kugler,  1984).  Similarly,  we  can  only  reproduce  an  utterance 
from  its  transcription,  if  we  have  information  specifying  the  correspondences 
between  the  symbol  string  and  the  motor  control  structures  that  must  be  en¬ 
gaged  for  speaking.  It  is  these  correspondences  that  the  illiterate  has  not 
discovered.  Just  how  these  two  forms  of  linguistic  information  are  related 
is,  of  course,  a  central  issue  of  speech  research.  My  concern  here  is  merely 
to  make  the  distinction.  For  we  shall  be  led  astray  in  our  study  of  speech 
perception  (and  so  of  speech  acquisition),  if  we  strive  to  equate  the  lin¬ 
guist's  description  of  speech  as  a  string  of  symbols  with  the  dynamic  struc¬ 
ture  of  the  speech  signal  itself. 

Consider,  here,  an  early  interpretation  of  the  lip-reading  studies  of 
McGurk  and  MacDonald  (1976).  These  authors  discovered  that  listeners’  percep¬ 
tions  of  a  syllable  presented  over  a  loudspeaker  could  be  changed,  if  they 
simultaneously  watched  a  videotape  of  a  speaker  producing  another  syllable. 
For  example,  presented  with  audio  [ba]  and  video  [da],  subjects  typically  re¬ 
port  the  latter,  optically  specified  syllable;  presented  with  audio  [na]  and 
video  [ba],  subjects  typically  report  [ma],  a  combination  of  the  two.  Such 
observations  are  consistent  with  the  notion  that  subjects  engage  two  Indepen¬ 
dent  phonetic  systems,  drawing  manner  and  voicing  features  from  the  acoustic 
structure,  place  of  articulation  features  from  the  optic  structure  (MacDonald 
&  McGurk,  1978).  This  interpretation  assumes  that  we  perceive  speech  by 
extracting  phonetic  features  and  combining  them  to  form  phonetic  segments — in 
other  words,  it  assumes  that  the  speech  signal  carries  information  about  a 
string  of  linguistic  symbols.  As  already  remarked,  this  is  true  at  one  level 
of  description.  However,  this  interpretation  bypasses  the  actual  event  speci¬ 
fied  by  the  dynamic  acoustic-optic  structure  and  does  not  address  the  puzzle 
of  its  transformation  into  a  static  linguistic  symbol. 

Moreover,  the  featural  interpretation  breaks  down  in  the  face  of  other 
findings.  For  example,  presented  with  audio  [ga]  and  video  [ba],  subjects 
typically  report  a  cluster  [b'ga]  or  [g'ba];  presented  with  the  reverse 
arrangement,  audio  [ba]  and  video  [ga],  subjects  often  report  a  sort  of  acous¬ 
tic-optic  blend,  [da].  In  these  instances,  the  percept  corresponds  either  to 
both  inputs  or  to  neither,  so  that  the  notion  of  two  independent  and  additive 
phonetic  systems  breaks  down. 
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While  much  remains  to  be  done  before  we  have  a  satisfactory  account  of 
such  findings,  the  effect  seems  to  arise  from  a  process  by  which  two  continu¬ 
ous  sources  of  information,  acoustic  and  optic,  are  actively  combined  at  a 
precategorical  level  where  each  has  already  lost  its  distinctive  sensory 
quality  (Summerf ield ,  1979).  In  other  words,  the  McGurk  effect  (and,  indeed, 
normal  lipreading  as  practiced  in  aural  rehabilitation)  is  only  possible  be¬ 
cause  acoustic  and  optic  structures  specify  an  amodal  event:  a  coordinated 
pattern  of  articulatory  action. 

Imitation 

A  general  capacity  to  imitate  is  rare  among  animals.  The  specialized 
capacity  to  imitate  vocalizations  is  confined  to  a  few  species  of  birds  and  of 
marine  mammals,  and  to  man.  Here  we  should  distinguish  between  mimicry  and 
repetition,  or  reproduction.  The  Indian  mynah  bird,  for  example,  mimics  human 
speech  quite  precisely,  within  the  limits  of  its  vocal  apparatus  (Klatt  & 
Stefanski,  1974).  However,  a  human  speaker  repeats  the  utterances  of  another 
(when  not  deliberately  attempting  mimicry)  by  producing  a  functionally  equiva¬ 
lent,  though  acoustically  distinct,  pattern  of  sound.  Given  within-species 
individual  differences  in  size  and  structure,  we  may  reasonably  suppose  that 
the  production  of  distinct,  yet  functionally  equivalent,  acts  is  the  normal 
mode  of  animal  Imitation,  whether  in  human  speech  or  in,  say,  the  nest-build- 
ing  of  a  young  chimpanzee.  In  any  event,  both  mimicry  and  reproduction  call 
on  a  specialized  capacity  for  finding  in  the  perceptual  array  an  organized 
pattern  of  information  specific  to  an  organized  pattern  of  action.  To  find  a 
pattern  the  imitator  must  find  both  the  pieces  of  an  act  and  their  spatio-tem¬ 
poral  relations  (Fentress,  1984). 

Consider,  for  example,  the  following  transcription  of  ten  attempts  by  a 
IS-monUi  old  girl  to  say  pen,  within  a  single  half-hour’s  recording  session: 
Cmo^  ,''A,de'^'?,hin,'''b8,  p^in,  t%t*^nt*^n,ba*’,d*^au’*,bua]  (Ferguson  &  Farwell, 
1975).  Note  once  again  that  the*  tr'anscrl’ptions  are  merely  convenient  (and 
approximate)  indicators  of  what  the  child  did.  For  what  the  child  evidently 
did,  in  each  case,  was  to  extract  from  the  sound  pattern  of  pen  Information 
specific  to  certain  articulatory  gestures,  such  as  lip  closure,  lingua-alveo¬ 
lar  closure,  velum  lowering,  glottal  narrowing  and  spreading.  Thus  the  child 
analyzed  the  word  (with  varying  success)  into  its  component  gestures,  or 
pieces,  but  could  not  discover,  at  least  motorically,  their  spatio-temporal 
relations.  Perhaps  we  have  here  an  instance  of  the  necessary  sequence  in 
learning  to  speak,  or  indeed  in  learning  to  reproduce  any  act  performed  by  an¬ 
other:  first  perceptual  analysis,  then  motor  synthesis.  We  can  hardly  doubt 
that  a  capacity  to  perceive  the  pieces  of  an  act  and  their  relations,  and  to 
reproduce  them  in  our  own  behavior,  rests  on  some  form  of  structural  (anatomi¬ 
cal,  physiological)  correspondence  between  imitator  and  model.  This  observa¬ 
tion  leads  us  to  a  brief  digression. 

Can  Non-human  Animals  Perceive  Speech? 

The  answer  to  this  question  must  depend  on  what  we  mean  by  "perceive 
speech."  Here  we  have  been  misled,  it  would  seem,  by  the  behaviorist  view  of 
perception  as  a  mere  matter  of  psychophysical  capacity.  We  have  tended  to  de¬ 
scribe  speech  in  purely  acoustic  terms  as  a  collection  of  "cues,"  without  re¬ 
gard  to  the  articulatory  events  that  the  cues  specify,  and  then  to  suppose 
that  any  animal  able  to  discriminate  these  cues  can  perceive  speech.  Yet  the 
psychophysical  capacities  of  an  unlimited  set  of  animals — from  the  human  in- 
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fant  to  the  chinchilla — may  suffice  to  discriminate  among  formant  transitions, 
formant  onset  frequencies,  brief  silences,  patches  of  noise,  and  so  on.  How¬ 
ever,  these  capacities  may  not  suffice  to  discover  the  functional  relations 
among  the  perceptual  pieces. 
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In  fact,  the  perceptual  status  of  communicative  signals  varies  even  for 
closely  related  species.  For  example,  while  two  species  of  macaque  (pig- tail 
and  bonnet)  and  an  African  vervet  may  learn  an  arbitrary  discriminative  re¬ 
sponse  to  contrasting  calls  of  the  Japanese  macaque,  the  latter  learns  the  re¬ 
sponse  significantly  more  rapidly  (Zoloth,  Petersen,  Beecher,  Green,  Marler, 
Moody,  &  Stebbins,  1979).  Moreover,  the  processes  underlying  the  Japanese 
macaque's  response  to  its  own  calls  are  evidently  localized  in  the  left  cere¬ 
bral  hemisphere,  while  those  of  the  other  two  species  of  macaque  are  not 
(Peterson,  Beecher,  Zoloth,  Moody,  &  Stebbins,  1978;  cf.  Heffner  &  Heffner, 
198^1).  Whether  this  hemispheric  specialization  has  a  perceptuomotor  origin 
(as  in  the  human:  see  below),  we  do  not  yet  know.  The  point  here  is  that,  if 
we  show  a  particular  discriminative  task  to  be  within  the  psychophysical 
competences  of  two  different  species,  we  have  not  thereby  shown  their  percepts 
to  be  equivalent. 
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In  short,  if  the  structure  of  perception  can  properly  be  said  to  be  tuned 
to  the  structure  of  the  perceiver's  capacity  for  action,  a  non-human  animal's 
perception  of  speech  must  differ  radically  from  a  human's.  What  actions  of  a 
macaque,  say,  are  controlled  by  its  perception  of  speech?  What  events  do  the 
acoustic  patterns  of  speech  specify  for  a  macaque?  Presumably,  the  patterns 
do  not  specify  articulatory  gestures,  and  the  actions  brought  under  control  in 
the  laboratory  (such  as  lever  holding  or  escape  from  shock)  are  the  arbitrary 
choices  of  an  experimenter,  adventitious  and  ethologically  empty.  In  other 
words,  the  information  in  speech  may  indicate  to  a  non-human  animal  what  it 
should  do  in  a  particular  situation,  but  (pace  the  mynah  bird)  the  information 
cannot  specify  for  the  animal,  as  it  does  for  a  human,  the  speaker's  pattern 
of  articulatory  gestures. 


Perceptuomotor  Relations  in  the  Infant 

Since  the  infant,  by  definition,  does  not  speak,  our  understanding  of 
perceptuomotor  development  over  the  first  year  of  life  must  be  largely 
inferential.  Here  I  will  consider  three  classes  of  evidence,  concerning:  (1) 
the  adult  perceptuomotor  system,  particularly  its  cerebral  locus;  (2)  Infant 
perceptual  capacity;  (3)  infant  behavior,  reflecting  hemispheric  specializa¬ 
tion  for  speech  perception. 
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The  Adult  Perceptuomotor  System 


Aphasia  studies  for  over  a  century  have  suggested  that  the  right  cerebral 
hemisphere  of  most  right-handed  individuals  is  essentially  mute  (see,  for 
example,  Milner,  197^).  Differential  anesthesia  of  left  and  right  hemispheres 
by  intracarotid  sodium  amytal  injection  (preparatory  to  possible  brain  sur¬ 
gery)  has  confirmed  this  fact  experimentally  (Borchgrevlnk,  1982;  Milner, 
Branch,  &  Rasmussen,  196i1).  Thus,  speech  motor  control  is  vested  in  the  left 
hemisphere  of  most  individuals  (roughly  901t  of  the  population).  (The  origins 
of  a  population  diversity,  such  that  speech  motor  control  is  vested  in  the 
left  hemisphere  for  some  90J,  in  the  right  hemisphere  for  some  10/t  of  the 
population,  are  not  yet  understood.) 


l'  ■•'I 


Studdert- Kennedy ;  Development  of  the  Speech  Perceptuomotor  System 


Since  any  imitative  behavior  calls  for  close  neurophysiological  connec¬ 
tions  between  perceptual  and  motor  processes,  we  might  predict  that  left  hemi¬ 
sphere  control  of  articulation  would  be  coupled  with  left  hemisphere  speciali¬ 
zation  for  speech  perception.  Numerous  monotic  and  dichotic  studies  of  normal 
subjects  have  confirmed  this  prediction,  and  have  demonstrated  a  double 
dissociation  of  left  and  right  hemispheres  for  the  perception  of  speech  and 
non-speech  (e.g.,  Kimura,  196la,  1961b;  Studdert- Kennedy  &  Shankweiler,  1970). 
Furthermore,  studies  of  split-brain  patients  (whose  cerebral  hemispheres  have 
been  surgically  separated  for  relief  of  epilepsy)  have  shown  that,  while  the 
right  hemisphere  may  recognize  the  meaning  of  a  word  from  its  overall  auditory 
shape,  only  the  left  hemisphere  can  carry  out  the  phonetic  analysis  necessary 
to  establish  a  new  word  in  an  Individual’s  lexicon  (Zaidel,  197*1,  1978). 
(Phonetic  analysis  refers,  of  course,  to  analysis  of  a  word  into  its  articula¬ 
tory  components  and  to  recognition  of  the  relations  among  them,  as  discussed 
above.)  Thus,  we  have  solid  evidence  that  the  adult  speech  perceptuomotor  sys¬ 
tem  is  a  left  hemisphere  function. 

Infant  Perceptual  Capacity 

As  is  well  known,  infants  in  the  first  six  months  of  life  can 
discriminate  virtually  any  adult  speech  contrast  on  which  they  are  tested  (for 
reviews,  see  Aslin,  Pisoni,  &  Jusczyk,  1983:  Eimas,  1982).  Much  of  the  infant 
research  has  been  carried  out  with  synthetic  speech  contlnua  on  which  adults 
typically  display  "categorical  perception,"  that  is,  good  discrimination  be¬ 
tween  sounds  that  fall  into  different  adult  phonetic  categories,  but  poor 
discrimination  between  sounds  that  fall  into  the  same  phonetic  category. 
Infants  have  generally  displayed  a  similar  pattern,  and  this  outcome  has  been 
Interpreted  as  evidence  that  infants  are  prepared  at  birth,  or  very  soon 
after,  to  perceive  speech  in  terms  of  adult  phonetic  categories  (Eimas,  1982). 

This  interpretation  has  been  weakened  by  two  sets  of  findings.  First,  we 
now  know  that  categorical  perception  is  not  peculiar  to  speech,  nor  even  to 
audition  (e.g.,  Pastore  et  al.,  1977).  Second,  Kuhl  and  her  colleagues  (Kuhl, 
1978;  Kuhl  &  Miller,  1978;  Kuhl  &  Padden,  1983)  have  demonstrated  categorical 
discrimination  along  synthetic  speech  continua  for  macaques  and  chinchillas. 
The  issue  is  complicated  by  the  fact  that  speakers  of  different  languages  may 
display  different  boundaries  between  the  phonetic  categories  of  a  continuum 
(see  Repp,  198*1)  and  we  may  suspect  (following  the  argument  of  the  previous 
section)  that  quite  different  processes  underlie  the  seemingly  equivalent  hu¬ 
man  and  animal  behavior.  However,  let  us  assume  that  categorical  perception 
is  essentially  a  psychophysical  phenomenon,  susceptible  perhaps  to  effects  of 
learning  and  attention,  but  based  on  the  psychoacoustic  tuning  of  the 
mammalian  auditory  system. 

Nonetheless,  we  have  ample  other  evidence  that  speech  already  has  a 
unique  status  for  the  infant  within  a  few  hours  or  days  of  birth.  For  exam¬ 
ple,  neonates  can  discriminate  speech  from  non-speech  (Alegria  &  Noirot,  1978, 
1982),  prefer  speech  to  non-speech  (Hutt,  Hutt,  Lenard,  Bernuth,  & 
Muntjewerff,  1968),  and  prefer  their  mother’s  voice  to  a  stranger’s  (DeCasper 
&  Fifer,  1980),  provided  she  speaks  with  normal  intonation  rather  than  in 
word-by-word  citation  (Mehler,  Barri^re,  &  Jasik-Gerschenfeld,  1978).  Howev¬ 
er,  the  strongest  evidence  for  the  unique  status  of  speech  comes  from  studies 
of  infant  hemispheric  specialization. 
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Cerebral  Asymmetry  for  Speech  in  Infants 

A  number  of  studies  have  demonstrated  dissociation  of  the  left  and  right 
sides  of  the  brain  for  perceiving  speech  and  non-speech  sounds  at,  or  very 
shortly  after,  birth.  These  include  both  physiological  and  behavioral  stud¬ 
ies.  For  example,  Molfese,  Freeman  and  Palermo  (1975)  measured  auditory 
evoked  responses,  over  left  and  right  temporal  lobes,  of  10  infants  aged  from 
one  week  to  10  months.  Their  stimuli  were  four  naturally  spoken  monosyll¬ 
ables,  a  C-Major  piano  chord,  and  a  250-11000  Hz  burst  of  noise.  Median  ampli¬ 
tude  of  response  was  higher  over  the  left  hemisphere  for  all  four  syllables  in 
nine  out  of  ten  infants,  higher  over  the  right  hemisphere  for  the  chord  and 
the  noise  in  all  ten  infants.  Molfese  (1977)  has  reported  similar  asymmetries 
for  syllables  and  pure  tones  in  neonates. 

Dissociation  between  responses  to  speech  and  non-speech  has  also  been 
demonstrated  by  Best,  Hoffman,  and  Glanville  (1982).  These  authors  tested 
forty-eight  2-  3-  and  i4-month-old  infants  for  ear  differences  in  a  memo¬ 
ry-based  dichotic  task.  They  used  a  cardiac  orienting  response  to  measure  re¬ 
covery  from  habituation  to  synthetic  stop-vowel  syllables  and  to  Minimoog 
simulations  of  concert  A  (440  Hz),  played  on  different  instruments.  In  the 
speech  task,  a  single  dichotic  habituation  pair  (either  /ba-da/  or  /pa-ta/) 
was  presented  nine  times  at  randomly  varying  intervals.  On  the  tenth  presen¬ 
tation,  one  ear  again  received  its  habituation  syllable,  while  the  other  re¬ 
ceived  a  test  syllable  (either  /ga/  or  /ka/),  differing  in  place  of  articula¬ 
tion  from  both  habituation  syllables.  An  analogous  procedure  was  followed  in 
the  musical  note  task.  The  results  showed  significantly  greater  recovery  of 
cardiac  response  for  right  ear  test  syllables  in  the  3~  and  4-month-olds,  and 
for  left  ear  musical  notes  in  all  age  groups.  The  authors  propose  that 
right-hemisphere  memory  for  musical  sounds  develops  before  left-hemisphere 
memory  for  speech  sounds,  and  that  the  latter  begins  to  develop  between  the 
second  and  third  months  of  life. 

A  further,  particularly  telling  result,  in  light  of  the  presumed  amodal 
nature  of  the  speech  percept,  comes  from  a  study  by  MacKain,  Studdert-Kennedy, 
Spieker,  and  Stern  (1983).  These  authors  showed  that  5-  to  6-month-old 
infants  preferred  to  look  at  the  face  of  a  woman  repeating  the  disyllable  they 
were  hearing  (e.g.,  [zuzi])  than  at  the  synchronized  face  of  the  same  woman 
repeating  another  disyllable  (e.g.,  [vava]).  Thus,  as  in  the  study  of  Kuhl 
and  Meltzoff  (1982;  Kuhl,  this  volume).  Infant  preferences  were  for  natural 
structural  correspondences  between  acoustic  and  optic  information,  specifying 
the  same  articulatory  event. 

However, the  most  remarkable  aspect  of  the  study  by  MacKain  et  al.  (1983) 
was  that  infant  preferences  for  a  match  between  the  facial  movements  they  were 
watching  and  the  speech  sounds  they  were  hearing  were  only  significant  when 
the  infants  were  looking  to  their  right  sides.  We  can  interpret  this  result 
in  the  light  of  work  by  Kinsbourne  and  his  colleagues  (e.g.,  Kinsbourne,  1972; 
Lempert  &  Kinsbourne,  1982).  Their  work  suggests  that  attention  to  one  side 
of  the  body  may  facilitate  processes  for  which  the  contralateral  hemisphere  is 
specialized.  If  this  is  so,  we  may  infer  that  infants  with  a  preference  for 
matches  on  their  right  side  were  revealing  a  left  hemisphere  sensitivity  to 
articulation  specified  by  acoustic  and  optic  information. 
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The  work  by  MacKain  and  her  colleagues  has  not  yet  been  replicated.  But 
if  it  proves  reliable,  we  have  some  evidence  that  5-  to  6-month-old  infants, 
close  to  the  onset  of  babbling,  already  display  a  left  hemisphere  sensitivity 
to  the  amodal  structure  of  speech  events.  For  the  moment,  this  seems  to  be 
close  as  we  have  come  to  detecting  an  incipient  capacity  for  imitation  on 
which  spoken  language  is  based. 

Summary  and  Conclusions 

Perception  and  action  are  mutually  entailed  components  of  a  single  sys¬ 
tem.  Their  interlocking  operation  is  possible  because  the  information  picked 
up  by  a  perceptual  system  is  amodal  and  directly  specifies,  within  the  con¬ 
straints  of  the  actor's  goal,  the  action  to  be  performed. 

Imitation  is  a  specialized  mode  of  action,  requiring  the  imitator  to  find 
in  the  act  of  a  model  both  the  pieces  of  the  act  and  their  spatio-temporal  re¬ 
lations.  Imitation  also  calls  for  close  neurophysiological  connections  be¬ 
tween  perception  and  motor  control.  For  speech  these  perceptuomotor  connec¬ 
tions  are  localized  in  the  left  cerebral  hemisphere. 

Studies  of  Infant  speech  perception  have  shown  '^hat  infants  are  sensitive 
to  structural  correspondences  between  acoustic  and  optic  specifications  of 
speech,  and  that  their  left  cerebral  hemispheres  are  differentially  activated 
by  speech  sounds  soon  after  birth.  We  also  have  preliminary  evidence  for  left 
hemisphere  sensitivity  to  the  amodal  structure  of  speech  by  the  fifth  or  sixth 
month  of  life. 

The  approach  to  speech  perceptuomotor  development  outlined  above  also 
promises  an  ontogenetic  solution  to  the  vexed  problem  of  the  incommensurabili¬ 
ty  of  the  speech  acoustic  signal  and  its  linguistic  description.  The  approach 
distinguishes  between  the  dynamic  information  conveyed  by  an  act  and  the  stat¬ 
ic  information  in  a  symbol  string.  Thus,  linguistic  units  are  not  postulated 
as  part  of  the  infant's  native  endowment.  Rather  they  are  seen  as  elements 
that  emerge  from  a  self-organizing  system  of  perceptuomotor  control  (cf.  Llnd- 
blom,  MacNeilage,  &  Studdert-Kennedy,  1983). 
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1  Introduction 

The  relation  between  script  and  speech  differs  among  the  various  ortho¬ 
graphic  categories.  In  general,  alphabets  maintain  a  closer  link  than  do 
logographies .  Comparisons  between  instances  of  each  category,  say  between  En¬ 
glish  and  Chinese,  are  instigated  in  order  to  uncover  whether  or  not  different 
orthographic  styles  might  be  reflected  in  differing  processing  strategies  used 
by  readers.  A  number  of  investigators  have  pointed  out,  however,  that  "alpha¬ 
bet"  does  not  constitute  a  monolithic  category  and  English  is,  in  no  sense,  to 
be  taken  as  typical  of  all  alphabets.  Nonetheless,  a  majority  of  the  reading 
data  have  been  collected  for  English  and  the  conclusions  they  suggest  have 
been  accepted,  more  or  less  by  default,  for  alphabets  in  general.  But  a  grow¬ 
ing  body  of  data  for  Serbo-Croatian,  the  (alphabetically  transcribed)  language 
of  Yugoslavia,  reveals  important  differences  with  English.  We  will  suranarize 
these  data  and  elaborate  their  implications  for  linguistic  issues,  particular¬ 
ly  the  role  of  phonology  in  reading,  that  may  be  important  for  Chinese. 

2  Linguistic  Issues  in  Cross-language  Comparisons 

Orthographies  can  be  distinguished  along  a  number  of  dimensions,  two  of 
which  will  concern  us  here.  First,  they  differ  with  respect  to  the  particular 
units  that  are  overtly  represented,  be  they  morphemes  or  syllables  or  the  more 
(linguistically)  abstract  phonemes.  Second,  orthographies  can  be  considered 
deep  or  shallow  depending  on  their  relative  remoteness  from  the  sounds  to  be 
read.  As  will  be  illustrated  in  the  following  characterizations  of  Ser¬ 
bo-Croatian,  English,  and  Chinese,  these  dimensions  are  orthogo¬ 
nal — orthographies  of  "equal  depth"  can  differ  in  the  unit  represented. 

Serbo-Croatian  uses  an  alphabet  that  represents  phonemes  in  a 
straightforward  symbol-to-sound  mapping;  Each  letter  has  only  one  pronuncia¬ 
tion.  A  novel  word  or  pseudoword  can  be  named  (in  the  sense  of  pronounced) 
simply  by  generating  the  sounds  from  the  letters.  A  letter  such  as  a  will  be 
pronounced  /a/  regardless  of  the  letters  that  precede  or  follow  it  (ignoring, 
of  course,  subtle  changes  as  a  consequence  of  coarticulation).  In  order  to 
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preserve  this  mapping,  the  etymological  relationships  among  words  are  sacri- 
fited.  Wherever  the  spoken  language  has  imparted  phonological  variation  in, 
say,  declensions  of  a  given  noun,  the  variations  are  enforced  in  the  spelling 
(e.g. ,  nominative  singular  RUK+A,  dative  singular  RUCI;  nominative  singular 
SNAHA ,  dative  singular  SNASI).  It  is,  therefore,  considered  to  be  a  shallow 
orthography  (Liberman,  Liberman,  Mattingly,  &  Shankweiler,  1980). 

In  contrast,  English  uses  an  alphabet  that  also  represents  phonemes  but 
enforces  morphological  continuity.  Where  the  spoken  language  changes  the 
pronunciation  of  a  root  morpheme,  its  spelling  does  not  necessarily  change. 
The  sounds  are  determined  by  phonological  rules  with  the  result  that  etymolog¬ 
ical  hints  are  retained  (e.g.,  the  relationship  between  "banb"  and  "bombard" 
is  preserved  in  their  spellings  despite  alteration  in  the  sound  of  the  second 
"b").  A  novel  word  or  pseudoword  can  be  named  by  generating  the  sounds  from 
the  letters  and  phonological  rules.  An  alphabet  that  does  not  represent 
phonological  variations  that  are  determined  by  phonological  rule*  can  be  said 
to  be  deep. 

Finally,  Chinese  uses  a  logography  to  represent  morphemes.  Although  a 
large  proportion  of  characters  are  phonograms — comprising  both  a  semantic  and 
a  phonetic  component — the  hints  to  sound  are  not  completely  reliable  (Wang, 
1973).  Using  the  phonetic  component  to  sound  out  a  character  yields  only  39? 
accuracy  (Tzeng  &  Hung,  1980).  By  and  large,  therefore,  the  character  names 
must  be  memorized  in  order  to  be  read.  Because  of  the  opacity  of  the  phonolo¬ 
gy,  Chinese  can  be  considered  a  deep  orthography. 

The  fact  that  orthographies  differ  with  respect  to  both  the  units  they 
represent  and  the  phonological  transparency  of  those  representations  suggests 
that  orthographies  might  also  vary  in  the  linguistic  demands  that  they  place 
on  the  reader,  particularly  the  beginner.  In  other  words,  the  effective  use 
of  orthographies  might  depend  on  how  much  readers  know  about  the  structure  of 
their  languages,  with  certain  orthographies  requiring  an  explicit  understand¬ 
ing  of  the  more  abstract  (and,  presumably,  harder  to  come  by)  aspects.  Limit¬ 
ing  our  discussion  to  structural  units,  speaker-hearers  can  become  aware  of 
the  words,  morphemes,  syllables,  and  phonemes  that  comprise  their  spoken  lan¬ 
guage.  If  they  are  to  become  readers  of  that  language,  alphabets  require  an 
appreciation  of  the  phonemic  structure  that  logographies  do  not.  Whatever  the 
orthography,  the  level  of  linguistic  awareness  (Mattingly,  1972)  must  be 
compatible  with  the  units  represented,  while  using  the  orthography  might  be 
said  to  tune  one  to  the  level  of  awareness  demanded.  By  this  reasoning,  flu¬ 
ent  readers  of  Chinese  are  less  likely  to  be  aware  of  the  phonemic  structure 
of  their  language  than  are  fluent  readers  of  English  because  fluency  in  the 
morpheme-based  orthography  does  not  demand  such  awareness. 

A  similar  circular  causality  is  found  in  what  has  been  termed  phonologi¬ 
cal  maturity  (Liberman  et  al.,  1980),  the  appreciation  that  readers  have,  to 
varying  degrees,  of  the  (morpho-)phonological  rules  which  rationalize  spel¬ 
lings  that  are  related  complexly  to  sound.  That  is  to  say,  phonological 
maturity  helps  in  reading  words  where  phonological  variation  is  determined  by 
rule  rather  than  orthographic  representation  (e.g.,  real  is  read  /ril/, 
reality  is  read  /ri. al' .at. e/) ;  reading  experience,  in  turn,  promotes 
phonological  development.  The  demands  of  linguistic  awareness  and  phonologi¬ 
cal  maturity  can  be  said  to  parallel,  more  or  less,  the  dimensions  we  identi¬ 
fied  as  distinguishing  orthographies — the  represented  unit  and  its  phonologi¬ 
cal  transparency,  respectively.* 
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3  Serbo-Croatian :  A  Bi-alphabetic,  Inflected  Language 

Phonological  transparency  is  only  one  characteristic  that  distinguishes 
Serbo-Croatian  from  English.  The  major  language  of  Yugoslavia  is  also  highly 
inflected.  Nouns,  pronouns,  and  adjectives  are  declined  in  seven  plural  and 
seven  singular  cases  (nominative,  locative,  dative,  instrumental,  genitive, 
accusative,  and  vocative).  Verbs  are  conjugated  by  person  and  number  in  six 
forms.  But,  because  of  the  dictum  to  "Write  as  you  speak  and  read  as  it  is 
written"  (the  guiding  principle  behind  the  mid-19th  century  alphabet  reforms 
directed  by  the  Serbian  language  scholar  Vuk  Karadzid),  root  morphemes  often 
are  varied  orthographically  when  an  inflectional  element  is  added. 

Of  primary  relevance  to  transforming  the  linguistic  issues  of  the  last 
section  into  experimental  questions,  however,  is  the  fact  that  Serbo-Croatian 
is  written  in  two  alphabets.  Both  the  Cyrillic  script  (learned  first  in  east¬ 
ern  parts  of  the  country)  and  the  Roman  script  (learned  first  in  the  West)  map 
onto  the  same  set  of  30  phonemes  but  in  an  interesting  way.  While  most  let¬ 
ters  are  unique  to  one  or  the  other  alphabet,  seven  are  common  (i.e. ,  are  read 
the  same  way  in  the  two  scripts)  and  four  are  ambiguous  (i.e.,  receive  a  dif¬ 
ferent  phonetic  interpretation  in  each  script).  Since  Yugoslavs  are  typically 
facile  with  both  alphabets,  the  letters  can  be  combined  in  a  variety  of  ways 
for  experimental  purposes,  which  will  become  apparent  in  Section  5.0. 

4  Assessing  Lexical  Access 

We  are  interested  in  whether  or  not  variations  in  the  speech-script  rela¬ 
tionship  promote  differing  processing  strategies  in  reading.  Since  reading 
involves  recognizing  words,  one  process  that  has  received  considerable  scruti¬ 
ny  is  the  pattern  recognition  step—how  is  a  written  letter  string  matched  to 
its  lexical  representation?  This  question  of  lexical  access  has  been  ad¬ 
dressed  with  (primarily)  two  paradigms:  (1)  In  lexical  decision  tasks,  sub¬ 
jects  must  decide  as  rapidly  as  possible  whether  or  not  a  given  letter  string 
is  a  word;  (2)  In  naming  tasks,  subjects  must  simply  read  the  letter  string 
aloud  as  rapidly  as  possible.  In  both  tasks,  the  time  transpiring  between  on¬ 
set  of  the  stimulus  and  initiation  of  the  response  is  measured.  Visual  and 
phonological  characteristics  of  the  letter  strings  are  varied  to  ascertain 
what  effect,  if  any,  they  have  on  the  response  latencies. 

Effects  on  lexical  decision  time  are  taken  to  have  implications  for  the 
nature  of  lexical  access,  models  of  which  include  linguistic  processes  (phono¬ 
logical  recoding  of  letter  strings),  nonlinguistic  processes  (simple  figural 
analyses),  and  combinations  of  both  (dual  processing).  Effects  on  naming  may 
be  consistent  with  one  or  another  lexical  routes  or  may  suggest,  further,  that 
the  lexicon  need  not  be  accessed  at  all  in  order  to  pronounce  a  letter  string. 
These  implications  rest  on  two  logical  underpinnings.  First,  if  a  letter 
string  is  phonologically  ambiguous  (i.e.,  can  be  pronounced  in  more  than  one 
/ay),  then  any  phonological  analysis  (if  it  exists)  ought  to  be  hindered  in 
comparison  to  such  an  analysis  on  phonologically  unique  letter  strings.  This 
would  be  true  in  both  lexical  decision  and  naming.  If  phonological  ambiguity 
produces  no  effect,  the  case  for  phonological  analysis  is  undermined.  Second, 
while  the  three  general  models  of  word  processing  all  suggest  that  words 
should  be  named  faster  than  pseudowords,  a  phonologically  analytic  strategy 
ought  to  yield  a  fairly  small  difference  that  is  relatively  constant  for 
ambiguous  and  unambiguous  letter  strings.  An  interaction  between  lexicality 
and  phonological  ambiguity,  however,  would  seem  to  support  one  of  the  other 
models.  These  will  be  elaborated  in  Section  6.0. 
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Obviously,  a  great  deal  hinges  on  the  manipulation  of  phonological 
ambiguity.  In  English,  two  methods  have  been  used.  In  one,  pseudowords  are 
constructed  to  be  homophonic  with  words.  While  lexical  rejection  of 
pseudohomophones  takes  longer  than  rejection  of  pseudowords  (Coltheart,  Dave- 
laar,  Jonasson,  &  Besner,  1977),  at  least  for  good  readers  (Barron,  1978), 
interpretation  of  this  fact  is  tricky  because  the  appropriateness  of 
pseudohomophones  has  been  questioned  on  a  number  of  grounds  (Feldman,  Lukate- 
la,  &  Turvey,  1985;  Martin,  1982).  These  include  (i)  the  possibility  that 
phonetic  representations  may  be  sensitive  to  orthographic  differences  between 
letter  strings  that  sound  alike  when  spoken  aloud;  (ii)  the  formal  distinc¬ 
tion,  in  English,  between  phonetic  and  morphophonological  representations;  and 
(iii)  the  suspicion  that  pseudohomophones  are  structurally  odd. 

The  second  way  in  which  phonological  ambiguity  has  been  manipulated  in 
English  is  through  a  comparison  of  words  with  regular  and  irregular  (or  excep¬ 
tional)  pronunciations.  Whether  or  not  differences  are  found,  however, 
depends  on  how  regularity  is  defined  (Parkin,  1982).  For  example,  words  in 
which  each  graphemic  unit  receives  the  major  phonemic  correspondence  (as  de¬ 
tailed  in  Venezky's  [1970]  rules)  are  considered  regular  while  those  that 
receive  a  minor  correspondence  may  be  treated  as  irregular  (Coltheart,  Besner, 
Jonasson,  &  Davelaar,  1979).  A  finer  distinction  reveals  that  words  can  be 
classified  as  regular  and  consistent  (i.e. ,  they  and  all  words  that  are  visu¬ 
ally  similar  to  them  receive  the  major  phonemic  correspondences)  or  regular 
and  inconsistent  (i.e.,  they  receive  major  correspondences  but  other  exemplars 
receive  minor  correspondences  and,  thus,  are  irregular  [Glushko,  1979]).  Some 
irregular  words  might  be  considered  especially  exceptional,  however,  if  only 
because  lexicographers  provide  pronunciation  guides  for  them  (but  not  for  all 
minor  correspondence  words  [Parkin,  1982]).  Moreover,  a  particular 
grapheme-phoneme  correspondence  will  be  considered  minor  and,  therefore, 
exceptional  because  there  are  fewer  instances  of  it  when,  in  fact,  those  in¬ 
stances  might  occur  with  greater  frequency  than  the  so-called  major 
grapheme-phoneme  correspondences  (Parkin,  1982).  Lastly,  phonologically 
irregular  words  may  differ  with  respect  to  whether  or  not  they  are  orthograph- 
ically  irregular  as  well  (Parkin  &  Underwood,  1983).  Depending  on  which  of 
these  characterizations  of  regularity  is  used,  one  will  or  will  not  find 
differences  between  regular  and  irregular  words,  either  supporting  or  belying 
claims  for  phonological  analysis. 

As  important  as  the  phonological  manipulation  is  to  evaluating  lexical 
properties,  it  is  not  clear  that  studies  in  English  have  been  successful  in 
providing  unequivocal  tests.  The  task  is  much  more  straightforward  in  Ser¬ 
bo-Croatian,  however,  where  the  unique  properties  of  the  orthography  can  be 
exploited.  In  the  following  review,  we  will  focus  on  the  bi-alphabetism  of 
fluent  readers. 

5  Reading  in  Serbo-Croatian  Is  Phonologically  Analytic 

Because  Serbo-Croatian  is  phonologically  shallow,  there  are  no  minor 
phonemic  correspondences,  no  irregular  words  nor  inconsistent  regular  words, 
and  no  orthographically  irregular  words.  Phonological  ambiguity  is  manipulat¬ 
ed  by  choosing  words  (or  nonwords)  that  combine  common  letters  with  unique 
letters  (unambiguous  letter  strings)  or  ccramon  letters  with  ambiguous  letters 
(ambiguous  letter  strings).  The  lexical  status  of  letter  strings  so  chosen 
will  depend  on  their  phonemic  interpretation — that  is,  in  which  alphabet  they 
are  read.  For  example,  an  ambiguous  string  could  be  a  word  in  Cyrillic  but  a 


Carello  &  Turvey:  Dependence  of  Reading  on  Orthography 


pseudoword  in  Roman  (or  vice  versa).  Or  it  could  be  one  word  in  Cyrillic  but 
a  different  word  in  Roman  (or  pseudowords  in  both).  An  unambiguous  string 
could  be  a  word  in  one  alphabet  and  impossible  in  the  other  (or  a  pseudoword 
in  one  and  impossible  in  the  other).  Finally,  if  composed  exclusively  of  com¬ 
mon  letters,  a  string  would  be  the  same  word  in  both  alphabets  (or  the  same 
pseudoword ) . 

In  lexical  decision  tasks,  comparisons  of  response  times  to  the  variety 
of  letter  string  types  reveals  a  phonological  ambiguity  effect — an  ambiguous 
letter  string  takes  longer  to  decide  about  than  an  unambiguous  letter  string. 
This  is  true  when  it  is  (i)  a  word  in  one  reading  and  a  pseudoword  in  the  oth¬ 
er;  (ii)  a  word,  though  different,  in  both  readings;  and  (iii)  a  pseudoword, 
though  different,  in  both  readings  (Lukatela,  Popadid,  Ognjenovic,  &  Turvey, 
1980;  Lukatela,  Savid,  Gligor ijev id,  Ognjenovid,  &  Turvey,  1978).  The  effect 
is  more  pronounced  with  words  than  pseudowords  (Feldman  &  Turvey,  1983; 
Lukatela  et  al,,  1973).  The  greater  the  number  of  ambiguous  letters  in  the 

string,  the  longer  lexical  decision  takes  (Feldman,  Kostid,  Lukatela,  &  Tur¬ 

vey,  1983;  Feldman  &  Turvey,  1983).  While  attempts  to  bias  subjects  toward  a 
Roman  reading  by  instructions  or  task  (i.e. ,  uniquely  Cyrillic  letters  never 
appear)  did  not  eliminate  the  effect,  the  presence  of  a  single  unique  charac¬ 
ter  did  (Feldman  et  al.,  1983;  Lukatela  et  al.,  1978).  Finally,  the  effect  is 

more  pronounced  in  good  readers  than  in  poor  readers  (Feldman  et  al.,  1985), 
suggesting  that  those  who  more  effectively  exploit  the  phonologically  analytic 
strategy  are  harmed  more  by  ambiguity. 

It  is  important  to  note  that  the  phonological  ambiguity  effect  is  not  an 
artifact  of  the  frequency  of  ambiguous  letter  strings.  These  occur  regularly 
in  the  Serbo-Croatian  language.  But  the  point  is  underscored  nicely  by  two 
experimental  findings.  First,  in  a  comparison  of  two  inflected  forms  of  the 
same  noun,  frequency  is  (at  one  level)  equal  since  they  are  the  same  word 

(e.g.,  RUKA  and  RUCI  both  mean  hand).  But  the  occurrence  of  the  various 
grammatical  cases  differs  such  that  nominative  singulars  (e.g.,  RUKA)  are  at 
least  ten  times  more  frequent  than  dative  singulars  (e.g,  RUCI).  When  both 
forms  are  unique  letter  strings,  the  latency  for  nominatives  is  (about  80  ms) 
shorter.  When  the  nominative  singular  is  ambiguous  and  the  dative  singular  is 
unambiguous  (i.e.,  has  one  unique  character),  latency  for  datives  is  (about 
185  ms)  shorter  (Feldman  et  al.,  1983).  Phonological  ambiguity  overrides  the 
frequency  advantage. 

The  second  rejoinder  to  frequency  arguments  comes  from  a  comparison  of 
words  that  are  ambiguous  in  one  alphabetic  transcription  but  unique  in  the 

other.  For  example,  the  Cyrillic  version  of  "hawk" — KObA^ — is  unique 
(pronounceable  only  as  /kobats/)  while  its  Roman  version — KOBAC — is  ambiguous 
(pronounced  /kobats/  if  read  as  Roman  but  /kovas/  if  read  as  Cyrillic.  With 
such  pairs,  a  word  can  be  used  as  its  own  control:  Frequency,  meaning, 
length,  number  of  syllables  are  identical.  Only  the  number  of  morphophonolog- 
ical  representations  is  different  but  that  is  sufficient  to  produce  a  350  ms 
difference  in  decision  time  (Feldman,  1981). 

6  Word- pseudoword  Comparisons 

As  indicated  in  Section  4.0,  the  three  general  models  of  word  processing 
agree  that  words  should  be  named  faster  than  pseudowords.  Their  reasons  are 

quite  different,  however,  as  are  the  particulars  of  how  lexicality  might  in¬ 

teract  with  phonological  ambiguity.  A  model  of  visual  analysis  suggests  that 
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words  and  pseudowords  are  read  aloud  by  a  common  analogical  process.  Very 
roughly,  a  word  finds  a  perfect  analogy  in  the  lexicon,  with  a  singularly  de¬ 
fined  code  for  pronunciation;  a  pseudoword  finds  several  analogies  in  the 
lexicon,  defining  several  alternative  pronunciations.  The  competition  among 
lexical  entries  induced  in  the  case  of  pseudowords  would  account  for  their 
slower  naming  relative  to  words  (e.g.,  Glushko,  1979:  Kay  &  Marcel,  1981). 
The  effects  of  such  competition  ought  to  be  especially  (perhaps  exclusively) 
apparent  in  experiments  that  compare  phonologically  ambiguous  letter  strings. 

A  model  of  phonological  analysis  holds  that  words  and  pseudowords  are 
read  aloud  by  a  common  phonological  strategy  that  uses  spelling-to-sound  rules 
(based  on  the  same  principle  as,  though  not  necessarily  identical  to,  the 
grapheme-to-phoneme  correspondences  identified  by  Venezky  [1970]),  Very 
roughly,  the  more  regular  the  letter  string  the  more  rapid  the  recoding.  As  a 
rule,  pseudowords  will  be  less  phonologically  regular  than  words,  resulting  in 
slower  naming  latencies  (e.g..  Parkin,  1982;  Parkin  &  Underwood,  1983).  This 
residual  difference  should  not  change  when  both  types  of  letter  strings  are 
chosen  to  be  purposely  ambiguous. 

Finally,  a  dual  process  view  asserts  that  words  are  read  aloud  by  a  visu¬ 
ally  based  look-up  of  a  word's  lexical  representation  where  the  word's 
pronunciation  can  be  retrieved.  In  contrast,  pseudowords  are  read  aloud  by 
assembling  a  pronunciation  on  the  basis  of  grapheme-phoneme  correspondences. 
It  is  hypothesized  that  visual  access  is  faster  than  rule-based  assembly; 
consequently,  words  are  named  more  rapidly  than  pseudowords  (e.g.,  Coltheart, 
1978;  Coltheart  et  al.,  1979).  Phonological  ambiguity  should  affect  only 
pseudowords  since  their  names  alone  are  derived  phonologically. 

In  Serbo-Croatian,  at  least,  it  appears  that  the  difference  in  naming 
latencies  between  words  and  pseudowords  does  not  change  when  phonological 
ambiguity  is  manipulated  (Feldman,  1981).  Both  are  slowed  by  about  i450  ms 
when  the  letter  strings  can  be  read  in  two  ways,  suggesting  that  phonological 
involvement  is  the  same  for  words  and  pseudowords.  Certainly,  this  strategy 
is  encouraged  by  the  fairly  direct  correspondence  to  speech  that  the  Ser¬ 
bo-Croatian  orthographies  exhibit.  One  might  expect  a  different  pattern  with 
English,  where  the  correspondence  between  orthography  and  speech  is  abstract. 
While  English  and  Serbo-Croatian  have  not  been  compared  directly  (i.e.,  in  the 
same  experiment  with  the  same  controls)  on  the  lexicality-ambiguity  interac¬ 
tion,  the  direct  comparisons  that  have  been  performed  reveal  differences  be¬ 
tween  the  languages  that  are  germane  to  this  issue.  Since  these  involve  a 
manipulation — semantic  priming — that  we  have  not  yet  discussed,  we'll  take  a 
moment  to  describe  its  logic  before  summarizing  the  results. 

It  is  commonly  found  that  lexical  decision  and  naming  are  facilitated 
when  the  target  word  is  preceded  by  a  semantically  related  priming  word  (Beck¬ 
er  &  Killion,  1977;  Massaro,  Jones,  Lipscomb,  &  Scholz,  1978;  Meyer, 
Schvaneveldt ,  &  Ruddy,  1975).  The  general  assumption  is  that  when  the  prime 
activates  its  own  lexical  representation,  that  activation  spreads  to  semanti¬ 
cally  related  items,  thereby  speeding  their  subsequent  lexical  processing. 
Tasks  that  are  lexically  mediated  ought  to  be  facilitated;  tasks  that  are  not 
facilitated  are  unlikely  to  be  lexically  mediated. 

Semantic  priming  of  lexical  decision  is,  in  fact,  found  in  both  English 
and  Serbo-Croatian  (Katz  &  Feldman,  1983).  For  naming,  however,  facilitation 
is  found  only  for  English,  suggesting  that  naming  in  the  phonologically  shal- 
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low  Serbo-Croatian  orthography  need  not  involve  the  lexicon.  This  point  is 
underscored  by  the  correlations  between  lexical  decision  and  naming  (which  may 
be  taken  as  an  index  of  processing  similarity).  In  English,  performance  on 
semantically  primed  lexical  decision  correlates  with  naming,  whether  the  lat¬ 
ter  is  semantically  primed  or  not;  lexical  decision  without  semantic  priming 
also  correlates  with  naming,  whether  primed  or  not.  In  Serbo-Croatian,  the 
only  significant  correlation  occurred  when  neither  task  was  semantically 
primed.  "The  similarity  between  tasks  is  strongest  when  there  is  least  in¬ 
volvement  of  the  internal  lexicon"  (Katz  &  Feldman,  1983,  p.  163). 

7  Conclusion 

The  case  for  phonological  analysis  as  the  primary,  nonoptional  reading 
strategy  in  Serbo-Croatian  is  quite  strong.  It  is  not  yet  clear,  however, 
whether  or  not  this  strategy  is  peculiar  to  Serbo-Croatian  (or  writing  systems 
with  similar  properties):  Does  phonological  analysis  result  from  experience 
with  a  shallow  orthography  (i.e.,  does  orthography  influence  processing)  or  is 
it  simply  easier  to  demonstrate  in  the  sorts  of  experiments  that  the  Ser¬ 
bo-Croatian  orthography  allows? 

As  strongly  as  we  argue  for  a  phono logically  analytic  strategy  in  Ser¬ 
bo-Croatian,  others  have  claimed  that  Chinese  characters  can  only  be  read  via 
the  visual  route.  Indeed,  lexical  decision  is  slowed  by  a  visual  manipulation 
wherein  the  internal  components  of  two-character  words  (and  nonwords)  are 
distorted  disproportionately,  for  example,  becomes  \  xj]  becomes  V:  (Hung, 
Tzeng,  Salzman,  &  Dreher,  198^1).  This  parallels  the  result  for  mixing  upper 
and  lower  case  letters  in  English  (e.g.,  Coltheart  &  Freeman,  1974)  but  is  in 
contrast  to  mixing  Cyrillic  and  Roman  letters  in  Serbo-Croatian.  The  latter 
slows  neither  lexical  decision  nor  naming  (Feldman  &  Kostid,  1981;  Katz  & 
Feldman,  1981).  Interestingly,  however,  visual  distortion  in  both  Chinese  and 
English  affects  poor  readers  more  than  good  readers  (Hung  et  al.,  1984).  This 
is  puzzling  if  one  assumes  that  the  manipulation  interferes  with  the  putative¬ 
ly  optimal  strategy  on  which  better  readers  ought  to  be  more  reliant.  Ser¬ 
bo-Croatian,  at  least,  follows  the  expected  logic  for  a  phono logically  analyt¬ 
ic  strategy — good  readers  are  hurt  more  by  phonological  ambiguity  (Feldman  et 
al.,  1985). 

We  do  not  know  if  fluent  readers  of  Chinese  rely  on  some  strategy  other 
than  visual  analysis  or  if  they  can  resort  to  some  other  strategy  if  the  visu¬ 
al  route  is  hindered.  We  do  know  that  there  are  hints  of  some  phonological 
analysis  of  Chinese  characters.  Detection  of  graphemic  components  (e.g., 
§  /tai/)  is  more  successful  when  the  component  carries  a  phonetic  clue  (as  in 
/tai/)  than  when  it  does  not  (as  in  /yi/  [Hung  &  Tzeng,  1981]).  Incon¬ 
sistent  characters  take  longer  to  name  than  consistent  characters  (where  con¬ 
sistency  is  defined  by  the  ratio  of  exemplars  pronounced  the  same  as  the  tar¬ 
get  to  the  total  number  of  characters  with  that  phonetic,  regardless  of  how 
they  are  pronounced  [Fang  &  Horng,  this  volume]).  And  a  comparison  of 
Japanese  kanji  (the  logographic  script  borrowed  from  Chinese)  with  kana  (a 
syllabary  that  depicts  the  phonetic  value  of  its  characters)  reveals  that 
colors  are  named  faster  when  written  in  kana  even  though  color  names  appear 
more  frequently  in  kanji  in  Japanese  literature  (Feldman  &  Turvey,  1980: 
cf.  Saito,  1981).  This  last  finding,  especially,  seems  troublesome  for  those 
models  that  restrict  the  role  of  phonological  analysis.  Phonological  involve¬ 
ment  is  demonstrated  for  words  (not  Just  pseudowords)  and  it  appears  to 
facilitate,  rather  than  slow,  naming.  One  might  argue  that  if  phonological 
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analysis  is  optional,  then  it  is  an  option  readily  (eagerly?)  exploited  when 

available — even  in  writing  systems  that  are  biased,  by  design  and  practice,  in 

favor  of  visual  analysis  (cf.  Brooks,  1977). 
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Footnotes 

‘This  is  akin  to  Klima's  (1972)  third  convention. 

*We  find  these  parallels  to  be  pedagogically  useful  but  they  may  be 
idiosyncratic  and  should  not  be  taken  as  representative  of  how  linguistic  de¬ 
mand  is  characterized  typically.  For  example,  Mattingly  (198^)  has  recently 
revised  his  distinction  of  phonological  maturity  and  linguistic  awareness  as 
entailing  grammatical  knowledge  and  access  to  such  knowledge,  respectively. 
We  are  less  able  to  use  this  distinction  for  our  present  purpose  of  classify¬ 
ing  orthographies. 


THE  RELATIONSHIP  BETWEEN  KNOWLEDGE  OF  DERIVATIONAL  MORPHOLOGY  AND  SPELLING 
ABILITY  IN  FOURTH,  SIXTH,  AND  EIGHTH  GRADERS 


Joanne  F.  Carlislet 

Abstract.  This  study  investigated  young  students'  knowledge  of 
derivational  morphology  and  the  relationship  between  this  knowledge 
and  their  ability  to  spell  derived  words.  The  subjects  (fourth, 
sixth,  and  eighth  graders)  were  given  the  Wide  Range  Achievement 
Test,  Spelling  subtest,  and  several  experimental  tasks--1)  a  test  of 
their  ability  to  generate  base  and  derived  forms  orally;  2)  a 
dictated  spelling  test  of  the  same  base  and  derived  words;  and  3)  a 
test  of  their  ability  to  apply  suffix  addition  rules.  The  results 
indicate  strong  developmental  trends  In  both  the  mastery  of  deriva¬ 
tional  morphology  and  the  spelling  of  derived  forms;  however,  spel¬ 
ling  performances  lagged  significantly  behind  the  ability  to 
generate  the  same  words.  Success  generating  and  spelling  derived 
words  depended  on  the  complexity  of  the  transformations  between  base 
and  derived  forms.  Further,  mastery  of  phonological  and  orthograph¬ 
ic  transformations  most  strongly  distinguished  the  three  grades  in 
both  spelling  and  generating  derived  forms.  Other  indications  that 
the  older  students  were  using  knowledge  of  morphemic  structure  in 
spelling  derived  forms  were  found  in  analysis  of  the  spelling  of 
base  and  derived  word  pairs  and  the  application  of  suffix  addition 
rules.  However,  incomplete  mastery  of  the  phonological  and  ortho¬ 
graphic  transformations  suggests  that  students  might  benefit  from 
explicit  instruction  in  morphemic  structure  in  order  to  improve 
their  spelling  of  derived  words. 

Introduction 

It  is  commonly  acknowledged  that  learning  to  spell  English  words  requires 
an  understanding  of  the  relationships  between  phonemes  and  graphemes  and  a 
memory  for  those  words  or  parts  of  words  that  are  "irregular."  However,  since 
our  orthography  is  morphophonemic,  it  seems  reasonable  to  believe  that  a 
knowledge  of  the  morphemic  structure  of  words  would  be  helpful,  perhaps  even 
necessary,  to  spell  accurately  the  many  words  of  more  than  one  morpheme  that 
we  use  in  writing.  Although  we  know  that  understanding  morphology  develops 
gradually  from  childhood  to  adulthood,  little  is  known  about  the  extent  to 
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which  this  knowledge  helps  an  individual  acquire  proficiency  in  spelling.  The 
present  study  is  concerned  with  the  spelling  of  derived  forms  and  addresses 
the  question,  is  there  a  relationship  between  knowledge  of  derivational 
morphology  and  spelling  ability? 

Although  the  relationship  between  morphological  knowledge  and  the 
acquisition  of  spelling  skill  would  seem  to  have  educational  relevance,  there 
have  been  very  few  investigations  of  the  matter.  The  paucity  of  research 
studies  is  surprising  since  quite  a  few  theorists  have  suggested  that 
sensitivity  to  morphemic  structure  should  enhance  the  ability  to  spell  English 
words  (Frith,  1980;  Henderson,  1982;  Liberman,  Liberman,  Mattingly,  &  Shank- 
weiler,  1980;  Mattingly,  1980;  Venezky,  1970)  and  that  explicit  instruction  in 
the  morphemic  structure  of  words  could  have  benefits  for  the  student  learning 
to  spell  derived  words  (Chomsky,  1970;  Russell,  1972). 

The  learning  of  derivational  morphology  is  a  complex  matter.  Although 
not  a  necessary  part  of  the  grammar  of  the  language,  the  affixes  allow  us  to 
express  a  concept  (e. g.,  love)  in  a  number  of  different  grammatical  forms, 
usually  while  retaining  the  basic  identity  of  the  base  form  (e.g.,  lovable, 
lovely,  loveliness) .  While  having  familiar  morphemes  in  many  different  words 
offers  ease  and  efficiency  in  conveying  meaning,  this  benefit  accrues  only  if 
we  are  able  to  appreciate  the  morphological  relationship  between  different 
words  in  the  same  word  family.  Unfortunately,  the  distance  between  base  and 
derived  form  in  phonology  and  semantics  can  sometimes  be  a  formidable  barrier. 
As  Klima  (1972)  suggests,  it  is  questionable  whether  most  adult  speakers  of 
English  recognize  the  many  relatively  obscure  morphological  relationships  that 
exist  in  the  English  language.  How  many,  for  example,  are  aware  that  crux  and 
crucial  are  members  of  the  same  word  family? 

Both  the  range  and  the  complexity  of  the  phonological  transformations 
from  base  to  derived  forms  may  make  derivational  relationships  hard  to 
appreciate.  While  Chomsky  and  Halle  (1968)  have  proposed  that  the  phonologi¬ 
cal  changes  from  base  to  derived  forms  are  orderly  and  ruleful,  a  number  of 
researchers  have  questioned  the  psychological  reality  of  the  underlying  phono¬ 
logical  rule  system  (Barganz,  1971;  Jaeger,  1984;  Moskowitz,  1973;  Steinberg, 
1973;  Templeton,  1980).  Collectively,  these  suggest  that  children  and  adults 
have  varied  degrees  of  understanding  of  the  underlying  phonological  rule  sys¬ 
tem. 

Several  characteristics  of  derivational  morphology  make  productive  knowl¬ 
edge  problematic.  First,  the  construction  of  derived  forms  does  not  follow 
consistent  patterns.  For  example,  two  quite  similar  words  such  as  terror  and 
horror  have  only  some  of  the  same  derived  forms  (Richardson,  1977).  They  have 
in  common  terrible  and  horrible,  terrify  and  horrify;  on  the  other  hand,  there 
is  terrorize,  but  not  horror! ze  and  horrid  but  not  terrid.  Second,  the  range 
of  syntactic  options  makes  learning  the  proper  derived  forms  complex.  Derived 
nouns,  for  example,  can  end  in  -ity,  -ment,  -ness,  -ence,  and  -th,  just  to 
name  a  few  variations.  In  some  cases,  a  base  word  occasionally  has  several 
derived  forms  of  the  same  part  of  speech,  such  as  honestness  and  hones ty  or 
bountiful  and  bounteous.  Third,  differences  in  the  meanings  of  the  suffixes 
are  often  subtle  or  nonexistent.  In  fact,  the  same  suffix  can  have  different 
meanings,  depending  on  the  word  it  is  attached  to  (Thorndike,  1941).  For 
example,  the  suffix  -ful  has  different  meanings  in  the  words  cupful  and  help¬ 
ful.  Finally,  derived  forms  sometimes  undergo  semantic  shifts  that  make  their 
relationship  to  their  base  forms  seem  remote.  This  is  the  case  with  apply  and 
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appliance.  When  similarity  in  meaning  is  absent,  the  realization  that  the 
base  and  derived  forms  are  related  requires  more  linguistic  sophistication 
than  many  individuals  have. 

In  view  of  such  complexities,  it  is  possible  that  the  learning  of 
word-specific  patterns  may  play  an  important  role  in  the  learning  of  deriva¬ 
tional  morphology.  Awareness  of  the  morphological  relatedness  of  words  and 
the  ability  to  analyze  morphemic  structure  may  depend  on  combined  features  of 
phonological  and  semantic  similarities  and  associations,  on  linguistic 
sophistication  and  even  on  the  specific  characteristics  of  the  language  tasks 
used  to  assess  this  ability  (Derwing  &  Baker,  1979:  Smith  &  Sterling,  1982). 

The  Development  of  Knowledge  of  Derivational  Morphology 

Children  learn  inflected  forms  of  words  rulefully.  Their  knowledge  of 
most  inflectional  rules,  evident  from  the  ability  to  supply  the  correct  forms 
of  nonsense  words  in  sentences,  is  generally  complete  by  the  time  they  are  se¬ 
ven  years  old  (Berko,  1958).  Derivational  rules,  however,  are  learned  more 
slowly  and  less  systematically  than  inflectional  rules.  Children's  vocabulary 
growth  during  the  years  7  to  1 2  includes  many  words  of  complex  morphological 
structure,  particularly  derived  forms  (Ingram,  1976).  To  some  extent, 
morphophonemic  rules  appear  to  be  learned  during  this  time  (Moskowitz,  1973)- 
However,  the  productive  knowledge  of  even  basic  derivational  forms  may  not  be 
complete  even  for  teenagers  (Selby,  1972).  In  fact,  since  derivational 
morphology  is  an  open  system,  learning  derived  forms  can  take  place  throughout 
adulthood  for  individuals  who  have  some  curiosity  about  words  (Klima,  1972). 

Although  derivational  morphology  cannot  be  said  to  be  mastered  within  a 
particular  developmental  period,  certain  developmental  trends  in  ruleful 
learning  of  derived  forms  have  been  found.  Using  a  task  modeled  after 
Berko's,  Derwing  (1976)  found  a  consistent  trend  among  children  (ages  8  to 
12),  adolescents,  and  adults  toward  productive  knowledge  of  five  of  the  six 
derivational  patterns  he  selected  for  investigation.  These  were  the  agentive 
-er ,  the  adjective,  noun  compound,  instrumental  -er ,  and  the  -ly  adverb. 
(The  sixth  pattern  was  the  diminutive,  which  did  not  become  productive.)  The 
developmental  trend  toward  mastery  found  in  this  study  suggests  that  the 
learning  of  derived  forms  begins  soon  after  age  seven  when  the  inflected  forms 
have  usually  been  mastered,  a  phenomenon  also  evident  from  Moskowitz 's  study 
(1973). 

The  constructions  that  Derwing  found  to  be  productive  are  regular  and 
quite  transparent.  The  base  word  remains  intact  in  the  derived  form  and  does 
so  without  requiring  a  change  in  the  phonology  of  the  base  word.  Not  all 
derivational  relationships  are  so  regular  in  construction  or  so  closely  relat¬ 
ed  in  phonology  and  orthography  (Berko,  1958).  There  is  less  evidence  to  sug¬ 
gest  that  children  have  productive  knowledge  of  those  forms  with  complex 
phonological  and  semantic  relationships. 

Morphological  Knowledge  and  Spelling  Ability 

English  orthography  maps  onto  the  morphophonology  of  the  language.  Chom¬ 
sky  and  Halle  (1968)  note  that  where  changes  in  pronunciation  from  a  base  to  a 
derived  word  are  predicted  by  the  regular  sound  pattern  of  the  language,  the 
orthography  does  not  need  to  reflect  the  change  (e.g.,  race  to  racial  and  re¬ 
duce  to  reduction).  A  number  of  studies  have  shown  that  the  orthographic 
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regularities  seem  to  provide  the  reader  with  clearer  clues  to  morphological 
relationships  than  the  underlying  phonological  rules  (Barganz,  1971;  Jaeger, 
198iJ;  Jarvella  &  Snodgrass,  1974;  LaSorte,  1980;  Moskowitz,  1973;  Steinberg, 
1973;  Templeton,  1980).  The  reader  who  can  discover  from  the  regularity  of 
the  spelling  that  two  words  are  morphologically  related  can  use  this  knowledge 
to  good  advantage  through  efficient  processing  of  words  and  through  apprecia¬ 
tion  of  semantic  relationships  and  syntactic  variations.  It  is  not  surpris¬ 
ing,  therefore,  that  there  appears  to  be  quite  a  strong  relationship  between 
morphological  knowledge  and  reading  or  vocabulary  development  (Barganz,  1971; 
Freyd  &  Baron,  1982;  LaSorte,  1980). 

The  issue  we  are  addressing  here,  however,  is  not  whether  orthographic 
regularities  help  the  reader,  but  whether  they  are  useful  to  the  spell¬ 
er — whether  knowledge  of  the  morphemic  structure  of  words,  which  may  be  more 
apparent  from  the  orthography  than  the  phonology,  is  drawn  upon  by  the  speller 
of  derived  words.  Reading  and  spelling,  though  closely  related,  are  quite 
different  tasks  (Frith,  1980).  C.  Chomsky  (1970)  argues  that  the  use  of 
orthographic  knowledge  to  spell  derived  words  correctly  is  a  natural  develop¬ 
ment,  at  least  for  the  good  speller  who  can  recall  the  orthographic 
similarities  of  related  words,  even  when  the  pronunciations  are  dissimilar. 
She  suggests  that  the  spellers'  knowledge  of  word  families  can  help  disambigu¬ 
ate  such  troublesome  elements  as  the  spelling  of  an  unstressed  vowel,  as  in 
democracy  (where  knowing  democrat  helps)  or  a  silent  consonant,  as  in  muscle 
(where  knowing  muscular  helps).  Russell  (1972)  believes  that  the  phonological 
and  orthographic  regularities,  apparent  from  reading  words,  can  be  emphasized 
in  instruction  in  spelling.  However,  neither  Chomsky  nor  Russell  offers  di¬ 
rect  evidence  to  support  the  position  that  knowledge  of  morphological  struc¬ 
ture  helps  the  speller  spell  derived  words  correctly. 

While  studies  of  the  spelling  of  young  children  give  some  indication  of  a 
growing  awareness  of  morphemic  structure  (Marino,  1979;  Rubin,  1984;  Schwartz 
&  Doehring,  1977),  we  do  not  know  if  an  awareness  of  simple  morphemic  struc¬ 
ture  carries  over  to  the  spelling  of  derived  forms,  particularly  those  that 
undergo  phonological  or  orthographic  shifts.  How  well  an  individual  speller 
can  apply  morphological  knowledge  to  the  task  of  spelling  may  depend  on  how 
explicit  as  well  as  how  extensive  this  knowledge  is.  It  may  also  depend  on 
the  speller's  mastery  of  the  orthographic  conventions  that  govern  the  addition 
of  suffixes  to  base  words. 

Two  studies  have  looked  at  the  spellers'  ability  to  use  morphological 
knowledge.  One  is  an  investigation  of  the  use  of  phonological  knowledge  and 
orthographic  knowledge  in  a  dictated  spelling  task  (real  and  nonsense  words) 
involving  good  spellers  at  the  sixth-,  eighth-,  and  tenth-grade  levels  (Tem¬ 
pleton,  1980).  The  results  of  this  study  suggest  that  seeing  a  base  word 
prompted  better  recall  of  the  phonological  rules  governing  the  spelling  of  de¬ 
rived  forms  than  hearing  the  base  word.  In  addition,  the  students  could  spell 
the  nonsense  derived  words  better  than  they  could  pronounce  them.  Templeton 
suggested  that  learning  about  the  orthographic  structure  of  derived  words 
might  bring  about  a  more  comprehensive  and  productive  awareness  of  the  under¬ 
lying  phonological  rules. 

The  second  study  of  spellers'  sensitivity  to  morphemic  structure  was  of 
good  and  poor  spellers  at  the  college  level  (Fischer,  Shankweiler,  &  Liberman, 
1985).  Good  spellers  were  much  better  than  poor  spellers  at  spelling 
morphophonemically  complex  words.  This  discrepancy  was  particularly  striking 
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because  the  two  groups  differed  less  in  their  ability  to  spell  words  that  are 
orthographically  transparent  ( adverb)  or  orthographically  deviant  (Fahren¬ 
heit).  Performances  on  additional  tasks  suggested  that  differences  in  spel¬ 
ling  morphophonemically  complex  words  were  attributable  to  differences  in 
linguistic  knowledge,  specifically  knowledge  of  morphological  structure.  The 
good  spellers  were  superior  to  the  poor  spellers  on  nonsense  tasks  of  prefixa¬ 
tion  and  suffixation,  suggesting  that  they  were  not  simply  better  spellers  of 
real  words. 

While  these  two  studies  seem  to  indicate  that  good  spellers  between  the 
sixth  grade  and  college  level  can  use  morphological  knowledge  to  help  them 
spell  derived  words,  this  pattern  may  not  hold  for  poor  spellers  at  these  lev¬ 
els  in  school  or  for  younger  students.  Spelling  errors  made  by  junior  high 
school  students  have  been  observed  to  indicate  lack  of  awareness  of  morphemic 
structure  (e.  g. ,  easally  for  easily )  (Carlisle,  198^4).  Similarly,  in  an  anal¬ 
ysis  of  spelling  errors  on  compositions.  Sterling  (1983)  found  that 
12-year-old  students  treated  derived  words  as  if  they  were  monomorphemic 
words.  His  analysis  of  the  students'  spelling  errors  indicated  that  inflected 
forms  were  spelled  by  ruleful  system,  but  derived  forms  were  spelled  as  unana¬ 
lyzed  wholes.  He  suggested  that  access  to  the  knowledge  of  morphological  re¬ 
lationships  may  be  obscured  by  the  complex  nature  of  derivational  morphology. 

Experiment 

The  general  purpose  of  the  present  study  was  to  investigate  the  early 
stages  of  acquisition  of  knowledge  of  derivational  morphology  and  of  the  abil¬ 
ity  to  spell  derived  words.  Several  different  considerations  guided  the 
formulation  of  the  questions  and  the  design  of  the  study.  First,  on  the  basis 
of  investigations  by  Berko  (1958),  Derwlng  and  Baker  (1979),  and  Selby  (1972), 
it  was  expected  that  learning  derivational  morphology  would  begin  in  the  third 
or  fourth  grades,  following  the  mastery  of  the  inflected  forms.  Accordingly, 
students  in  the  fourth,  sixth,  and  eighth  grades  were  chosen  as  subjects  in 
order  to  provide  Insight  into  the  developmental  mastery  of  derivational 
morphology.  Second,  the  study  was  based  on  the  hypothesis  that  students  do 
acquire  ruleful  knowledge  of  the  derivational  morphology  and  that  they  do  not 
simply  learn  to  spell  derived  forms  as  unanalyzed  whole  words. 

The  research  questions  were  as  follows:  First,  are  there  developmental 
trends  between  the  fourth  and  eighth  grades  in  the  acquisition  of  morphologi¬ 
cal  knowledge  and  knowledge  of  the  spelling  of  derivatives?  Second,  is  there 
a  relationship  between  the  knowledge  of  derivational  morphology  and  the  abili¬ 
ty  to  spell  derived  forms  in  the  fourth,  sixth,  and  eighth  grades?  Third,  is 
there  evidence  that  the  learning  of  derivational  morphology  and  the  spelling 
of  derived  forms  is  ruleful  in  nature,  taking  into  account  both  phonological 
and  orthographic  transformations? 

In  order  to  investigate  these  issues,  two  tasks  were  devised  to  allow  for 
direct  comparisons  of  the  two  skills — an  oral  test  of  the  ability  to  generate 
derived  forms  and  a  dictated  spelling  test  using  the  same  words.  The  words 
were  chosen  to  include  four  possible  relationships  between  base  forms  and  de¬ 
rived  forms,  on  the  assumption  that  these  would  engender  errors  that  would  re¬ 
flect  different  levels  of  mastery  of  phonological  and  orthographic  rules. 
Included  were  (a)  word  pairs  in  which  there  is  NO  CHANGE  in  the  phonology  or 
orthography  (e.g.,  enjoy  and  enjoyment) ,  (b)  pairs  in  which  there  is  a  PHONO¬ 
LOGICAL  CHANGE  but  no  orthographic  change  (e.g.,  major  and  majority) ,  (c) 


Carlisle;  Relationship  Between  Derivational  Morphology  and  Spelling  Ability 


pairs  in  which  there  is  an  ORTHOGRAPHIC  CHANGE  but  no  phonological  change 
(e.g.,  rely  and  reliable) ,  and  (d)  pairs  in  which  phonology  and  orthography 
BOTH  CHANGE  (e.g.,  reduce  and  reduction) . 

In  developing  these  tests  to  address  the  research  questions,  we 
anticipated  two  particular  patterns  of  results.  First,  on  the  question  of  the 
developmental  trends  of  morphological  knowledge  and  spelling  ability,  we 
expected  performance  on  the  dictated  spelling  test  to  lag  behind  performance 
on  the  test  of  oral  generation  (called  the  Test  of  Morphological  Structure), 
since  the  development  of  morphological  knowledge  most  likely  precedes  the 
ability  to  use  this  knowledge  in  spelling.  Second,  on  the  question  of  the 
ruleful  nature  of  learning  the  morphology  and  spelling  of  derived  words,  we 
expected  that  the  words  undergoing  phonological  and  both  phonological  and 
orthographic  changes  would  present  more  difficulty  than  the  words  with  more 
transparent  relationships  (those  undergoing  no  change  or  just  orthographic 
change).  This  expectation  was  based  on  the  finding  of  various  research  stud¬ 
ies  that  the  more  remote  the  relationship  between  base  and  derived  forms,  the 
more  difficult  it  is  to  learn  the  relationship  rulefully  (Berko,  1958;  Derw- 
ing,  1976;  Derwing  &  Baker,  1979;  Moskowitz,  1973;  Templeton,  1980). 

A  final  consideration  reflected  the  nature  of  orthographic  rules  and  the 
derived  words  whose  spelling  is  governed  by  these  rules.  The  spelling  of  such 
words  draws  on  a  somewhat  different  kind  of  "ruleful"  learning — the  conven¬ 
tions  of  our  spelling  system.  While  a  knowledge  of  the  morphological  compo¬ 
nents  of  words  such  as  "sunny"  would  make  the  task  of  spelling  easier,  specif¬ 
ic  knowledge  of  the  conventions  of  spelling  words  with  suffixes  (such  as  the 
rules  governing  the  doubling  of  consonants)  would  also  seem  to  be  helpful,  if 
not  necessary.  An  exploratory  study  of  the  mastery  of  suffix  addition  rules 
between  the  seventh  and  ninth  grades  showed  that  words  with  suffixes  made  up 
more  than  half  the  errors  in  the  students'  compositions  (Carlisle,  1984). 
Therefore,  a  test  was  devised  that  would  help  determine  whether  the  students 
were  able  to  apply  the  suffix  addition  rules  consciously.  Since  the  ortho¬ 
graphic  changes  could  be  memorized  as  word-specific  spellings,  this  test  re¬ 
quired  the  addition  of  suffixes  to  nonsense  words.  On  the  premise  that  mas¬ 
tery  of  the  suffix  addition  rules  is  dependent  on  knowledge  of  morphemic 
structure  and  knowledge  of  abstract  generalizations,  the  students'  ability  to 
apply  these  spelling  rules  was  expected  to  develop  later  than  their  knowledge 
of  morphological  structure. 


Method 


Subjects 

The  subjects  were  65  students  from  the  fourth,  sixth,  and  eighth  grades 
of  a  rural  Connecticut  school  system.  The  22  sixth  graders  and  the  21  eighth 
graders  came  from  classes  studying  language  arts  and  literature.  These  were 
selected  by  the  teachers  on  the  basis  of  class  size  and  availability  of  time. 
The  fourth-grade  group  was  made  up  of  22  students  from  two  elementary  class¬ 
rooms.  All  subjects  were  Judged  by  their  teachers  as  having  at  least  average 
Intelligence. 

Procedures 


The  Spelling  subtest  of  the  Wide  Range  Achievement  Test  (WRAT)  was  admin¬ 
istered  to  each  grade  level  group  (Jastak  &  Jastak,  1978).  Within  a  week  the 
Derived  Forms  subtest  of  the  Spelling  Test  was  administered  to  each  grade-lev¬ 
el  group.  One  week  later  the  Base  Forms  sub  test  of  the  Spelling  Test  and  the 
ise 
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Test  of  Suffix  Addition  were  administered.  Two  weeks  after  the  administration 
of  the  Derived  Forms  subtest  of  the  Spelling  Test,  the  Test  of  Morphological 
Structure  was  administered  to  each  student  individually. 

Materials 

1 .  Wide  Range  Achievement  Test,  Spelling  Subtest  ( Jastak  &  Jastak, 
1978 ) .  This  dictated  spelling  test  was  administered  to  determine  the  spelling 
capabilities  of  the  subjects  and  to  evaluate  the  validity  of  the  experimental 
spelling  tests.  Level  I  was  administered  to  the  fourth-grade  group,  and  Level 
II  to  the  sixth-  and  eighth-grade  groups.  Level  II,  the  appropriate  form  to 
use  with  youngsters  aged  12.0  and  over,  was  given  to  all  sixth  graders  even 
though  some  of  them  were  not  yet  12  in  order  to  permit  group  administration  of 
the  test  and  to  insure  an  accurate  comparison  of  spelling  abilities  within  the 
sixth-grade  group.  The  test  was  administered  in  accordance  with  the  direc¬ 
tions  for  group  administration.  The  students*  performance  on  the  Wide  Range 
Achievement  Test  (WRAT)  Spelling  subtest  yielded  the  following  grade-equiva¬ 
lent  scores:  fourth  grade  5.9  (standard  deviation  of  1.0);  sixth  grade,  6.7 
(standard  deviation  of  1.M);  and  eighth  grade,  9.4  (standard  deviation  of 
1.3).  The  correlation  between  the  students*  performances  on  the  WRAT  Spelling 
subtest  and  on  the  Derived  Forms  sub  test  of  the  Spelling  Test  (described  here¬ 
after)  was  .64  (£  <  .001). 

Test  of  Morphological  Structure.  This  experimental  test  was  designed 
to  assess  knowledge  of  derivational  morphology.  The  test  has  two  subtests. 
For  the  Derived  Forms  sub  test,  the  student’s  task  was  to  state  a  specific  de¬ 
rived  form,  once  the  examiner  had  given  the  base  word  and  a  sentence  that 
needed  the  derived  form  as  the  final  word  to  complete  the  sentence.  (The 

first  item  on  this  subtest  was:  "Warm.  He  chose  the  jacket  for  its  .**  The 

target  response  was  "warmth.")  For  the  Base  Forms  subtest,  the  student’s  task 
was  to  state  the  base  form,  once  the  examiner  had  given  the  derived  form  and 
an  appropriate  sentence,  designed  to  end  with  the  base  form.  (The  first  item 

on  this  subtest:  "Growth.  She  wanted  her  plant  to  _ ."  The  target  response 

was  "grow.") 

The  words  on  the  test  (see  the  Appendix)  are  based  on  four  types  of 
linguistic  relationship  between  the  base  word  and  derived  form: 

a.  NO  CHANGE — Neither  the  phonology  nor  the  orthography  of  the  base 
changes  in  the  derived  form  (e.g.,  enjoy  and  enjoyment) . 

b.  ORTHOGRAPHIC  CHANGE — The  spelling  but  not  the  phonology  of  the  base 
word  changes  in  the  derived  form.  Three  types  of  changes  were  included  in  the 
word  list:  the  doubling  of  a  final  consonant  before  the  suffix  (i.e. ,  sun  to 
sunny),  the  transformation  of  the  ^  to  ^  (e.g.,  rely  to  reliable) ,  and  the 
omission  of  a  final  e  before  a  suffix  beginning  with  a  vowel  (e.g.,  endure  to 
endurance) . 

c.  PHONOLOGICAL  CHANGE — The  pronunciation  changes  in  the  shift  from  the 
base  word  to  the  derived  form  without  an  accompanying  change  in  spelling. 
Four  kinds  of  phonological  change  were  included:  (a)  tense  to  lax  vowel 
(e.g.,  heal  to  health) ,  (b)  vowel  reduction  (e.g.,  original  and  originality) , 

(c)  shift  in  the  pronunciation  of  a  consonant  (e.g.,  magic  and  magician) ,  and 

(d)  shifts  in  both  a  vowel  and  a  consonant  pronunciation  (e.g.,  sign  and  sig- 
nal) . 
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d.  BOTH  CHANGE — Changes  in  both  the  orthography  and  the  phonology  occur 
in  the  shift  from  base  word  to  derived  form.  Among  the  words  of  this  group 
are  representatives  of  different  types  of  phonological  shifts,  including  vowel 
shifts,  consonant  shifts,  and  shifts  in  the  pronunciation  of  both  vowel  and 
consonant.  Examples  of  BOTH  CHANGE  word  pairs  are  deep  and  depth,  decide  and 
decisi on,  and  reduce  and  reduct  ion. 

Words  for  the  two  subtests.  Derived  Forms  and  Base  Forms,  were  selected 
to  be  as  similar  as  possible  in  length,  frequency,  affixation,  and  similarity 
in  meaning  of  the  root  word  and  its  derived  form.  First,  words  on  the  two 
subtests,  type  by  type,  do  not  differ  in  word  length,  as  determined  by  number 
of  letters.  The  average  length  for  the  base  words  is  5.6  letters  for  Derived 
Forms  and  5.7  letters  for  Base  Forms;  the  derived  forms  of  both  subtests  aver¬ 
age  8.5  letters. 

Second,  an  effort  was  made  to  ensure  the  familiarity  of  the  words  for 
students  In  grades  four  through  eight.  As  a  measure  of  the  familiarity  of  the 
written  forms,  only  words  with  a  Standard  Frequency  Index  rating  of  40  or 
above  were  used  (Carroll,  Davies,  &  Rlchman,  1971).  (A  Standard  Frequency  In¬ 
dex  of  40  indicates  a  word  that  has  an  estimated  frequency  of  one  in  a  million 
words.)  The  words  were  equated  for  frequency  by  word  type  (NO  CHANGE,  ORTHO¬ 
GRAPHIC  CHANGE,  and  so  on)  on  the  two  subtests,  Base  Forms  and  Derived  Forms. 
The  mean  frequencies  are  as  follows;  for  the  base  words,  55.1  (SD  1.8)  on  the 
Base  Forms  sub  test  and  55.2  (SL  0.9)  on  the  Derived  Forms  sub  test;  for  the  de¬ 
rived  words  49.6  (SD  2.4)  on  the  Base  Forms  sub  test  and  50.6  (SD  1.8)  on  the 
Derived  Forms  subtest. 

Third,  attempts  were  made  to  control  for  semantic  distance  (i.e. ,  the 
similarity  of  the  meanings  of  base  and  derived  forms),  semantic  variations, 
and  syntactic  options,  all  factors  that  can  affect  the  difficulty  of  generat¬ 
ing  morphological  forms.  An  effort  was  made  to  select  base  and  derived  forms 
with  familiar  and  similar  meanings.  The  sentences  were  written  in  such  a  way 
as  to  constrain  possible  choices  in  meaning  and  form.  Pilot  testing  was  used 
to  eliminate  items  that  did  not  meet  these  criteria. 

The  order  of  items  on  each  sub  teat  was  determined  by  creating  ten  sets  of 
four  items,  each  set  made  up  of  one  word  of  each  word  type  (NO  CHANGE,  ORTHO¬ 
GRAPHIC  CHANGE,  PHONOLOGICAL  CHANGE,  and  BOTH  CHANGE).  The  four  word  types 
were  randomly  ordered  within  each  set,  and  the  ten  sets  were  randomly  ordered 
on  the  teat. 

The  test  was  administered  by  means  of  a  tape-recording  in  standard  En¬ 
glish  spoken  by  a  native  American  male  speaker.  Directions  and  practice  items 
were  given  by  the  examiner.  The  directions  indicated  that  the  student  was  to 
give  the  form  of  the  word  that  correctly  completed  the  sentence.  Three  prac¬ 
tice  items  were  given  to  all  the  students;  the  first,  for  example,  was: 
"Farm.  My  uncle  is  a  ."  The  correct  response  was  farmer. 

If  the  student  completed  the  first  practice  item  incorrectly,  the  correct 
answer  was  provided.  The  item  was  then  repeated  so  that  the  student  could 
give  the  correct  answer.  Once  the  tape  was  started,  the  administration  con¬ 
tinued  without  further  assistance.  If  a  student  gave  no  response  to  a  test 
item  in  the  allotted  time  (5  s  between  the  end  of  one  item  and  the  beginning 
of  the  next),  the  tape  was  stopped,  and  the  student  was  asked  if  he  or  she 
could  give  a  form  of  the  word  that  completed  the  sentence.  After  this  answer 
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was  recorded,  the  student  was  asked  to  try  to  give  a  prompt  response  to  each 
item  and  was  reminded  that  extra  time  would  not  be  given  for  other  items. 

3.  The  Spelling  Test.  This  dictated  spelling  test  was  used  to  determine 
whether  the  students  could  spell  the  same  base  words  and  their  derived  forms 
that  rr^ike  up  the  Derived  Forms  subtest  of  the  Test  of  Morphological  Structure. 
The  first  subtest  (Derived  Forms)  consists  of  the  ^0  derived  forms,  given  by 
dictation.  The  second  part  (Base  Forms),  consists  of  the  AO  base  words,  also 
presented  by  dictation.  The  words  appear  in  random  order  in  each  sub  test. 
The  Derived  Forms  sub  test  was  administered  a  week  before  the  Base  Forms  sub¬ 
test  so  that  the  subjects  would  not  be  sensitized  to  the  relationship  between 
the  root  and  derived  forms  in  spelling  the  derived  forms. 

The  test  was  administered  by  means  of  a  tape-recording  in  standard  En¬ 
glish  spoken  by  a  native  American  male  speaker.  Each  word  was  presented  first 
alone,  then  in  a  sentence,  and  finally  alone.  There  was  a  10-s  lapse  between 
the  last  pronunciation  of  the  spelling  word  and  the  start  of  the  next  item. 
The  directions  and  two  sample  items  were  given  by  the  examiner  orally.  The 
directions  explained  the  nature  of  the  test  and  the  student's  task,  including 
giving  the  procedure  for  writing  the  words.  The  students  were  told  that  they 
could  not  pick  up  their  pencils  to  write  the  dictated  word  until  the  test  item 
had  been  completed.  The  students  were  directed  "To  listen  carefully  to  each 
word  and  the  way  it  is  used  in  the  sentence."  The  same  directions  were  used 
for  the  two  sub  tests. 

Test  of  Suffix  Addition.  This  test  was  designed  to  determine  the  ex¬ 
tent  to  which  students  were  able  to  apply  the  rules  that  govern  the  addition 
of  suffixes  to  base  words.  Nonsense  words  were  used  as  the  base  words  so  that 
the  correct  execution  of  this  task  could  not  be  accomplished  on  the  basis  of 
familiarity.  The  test  consists  of  30  nonsense  words,  each  followed  by  an 
addition  sign  (+),  a  suffix,  an  equal  sign  and  a  blank  line.  (The  first  item, 

for  example,  is  "dun  +  er  =  .")  The  nonsense  words  were  constructed  from 

real  words  by  substituting  one  consonant  for  another  (dun  for  run)  or  one  con¬ 
sonant  blend  for  a  consonant  or  another  consonant  blend  (drim  for  swim  or  prad 
for  sad) .  In  no  case  was  the  substituted  consonant  the  final  consonant  in  the 
original  word. 

Each  item  on  the  Test  of  Suffix  Addition  requires  the  use  of  one  of  the 
three  suffix  rules  that  form  the  basis  of  the  ORTHOGRAPHIC  CHANGE  word  type  on 
the  Test  of  Morphological  Structure.  There  are  ten  items  for  each  spelling 
rule — the  rule  governing  the  doubling  of  a  final  consonant  (called  the  "dou¬ 
bling"  rule),  the  final  f’ule  and  the  final  ^  rule.  In  addition,  since 
sometimes  no  change  is  made  in  the  base  word  when  the  suffix  is  added,  about 
half  the  words  required  an  orthographic  change  for  correct  suffix  addition  and 
the  other  half  did  not.  The  test  items  assess  the  knowledge  of  some  fairly 
refined  aspects  of  the  conventions  for  suffix  addition.  For  example,  "leace  + 

able  =  _ "  requires  the  knowledge  that  the  e  must  be  retained  to  indicate  the 

"soft"  sound  of  the  c  in  leace  (l.e. ,  leaceable) .  Such  items  were  included  to 
probe  the  breadth  of  the  students'  knowledge  of  the  rules  that  govern  suffix 
addition. 

Directions  for  this  test  were  given  aloud  by  the  examiner,  and  two  exam¬ 
ples  were  completed  on  the  blackboard  to  illustrate  what  was  expected  of  the 
students.  The  directions  indicated  that  the  base  words  were  not  real  words. 
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but  that  the  students  were  to  put  the  two  parts  (base  and  suffix)  together  as 
if  they  were  real  English  words. 

The  student  wrote  each  word  on  a  long  blank  line  following  the  test  item. 
There  was  no  time  limit.  Most  students  took  approximately  10  minutes  to  com¬ 
plete  the  test. 


Results 


Developmental  Trends  in  the  Learning  of  Derivational  Morphology 

Performances  on  the  Test  of  Morphological  Structure  (TMS)  were  initially 
scored  by  tabulating  the  number  of  correct  responses  for  each  subtest,  Base 
Forms  and  Derived  Forms.  The  mean  scores  for  each  subtest  of  the  TMS,  given 
in  Table  1,  show  that  there  was  an  increase  in  the  knowledge  of  derivational 
relationships  by  grade  level. 


Table  1 

Mean  number  correct  (and  SDs)  on  the  Test  of  Morphological  Structure,  the 
Spelling  Test,  and  the  Test  of  Suffix  Addition  by  grade  level 


Test  of 

Morphological 

Spelling 

test 

Test  of  Suffix 

Structure 

Addition 

Base 

Derived 

Base 

Derived 

Grade 

forms 

forms 

forms 

forms 

4 

30.8 

26.3 

24.9 

14.6 

16.0 

(6.9) 

(5.4) 

(9.3) 

(9.8) 

(4.0) 

6 

35.2 

31 .9 

34.2 

26.0 

17.9 

(4.1) 

(3.8) 

(4.1) 

(7.5) 

(3.3) 

8 

39.4 

35.7 

38.2 

34.4 

21  .0 

(0.7) 

(2.4) 

(3.0) 

(5.3) 

(3.7) 

Note:  Maximum  score  for  TMS  and  ST  =•  40. 


Maximum  score  for  TSA  -  30. 


The  performances  at  the  three  grade  levels  were  found  to  be  significantly 
different  for  both  the  Base  Forms,  F(2,62)  -  18.99,  £  <  .001,  and  the  Derived 
Forms,  F(2,62)  -  26.37,  £  <  .001.  Paired  comparisons  (Scheff^,  £  <  .05) 

showed  ~that  for  both  the  Base  Forms  and  Derived  Forms  sub  tests  the  fourth 
grade  was  significantly  different  from  the  sixth  grade  and  from  the  eighth 
grade,  and  the  sixth  grade  was  significantly  different  from  the  eighth  grade. 
In  the  eighth  grade  the  students'  performance  on  both  sub  tests  was  close  to 
the  celling  level  of  the  test. 
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Differences  in  the  students’  ability  to  generate  the  word  forms  on  the 
TMS  cannot  be  attributed  to  differences  in  word  frequency  or  word  length.  The 
correlations  between  errors  on  the  TMS  and  word  frequency  (taken  from  the 
norms  of  Carroll  et  al.,  1971)  were  very  low  for  the  Base  Forms,  r  =  .14,  £  = 
.20,  and  for  the  Derived  Forms,  r  =  -.08,  p  =  .43.  Correlations  between  word 
length  and  errors  were  also  very  low  both  for  Base  Forms,  r  =  -.26,  £  =  .02, 
and  for  Derived  Forms,  r  »  .01 ,  £  »  .90. 

The  Spelling  Test  (ST)  was  subjected  to  a  similar  analysis.  The  stu¬ 
dents'  performances  were  scored  on  the  basis  of  the  number  of  words  spelled 
correctly  on  each  subtest.  Base  Forms  and  Derived  Forms.  Letters  incorrectly 
or  ambiguously  formed  were  counted  wrong.  Where  the  legibility  of  a  letter  or 
word  was  questionable,  one  additional  judge  scored  the  word  independently. 
This  procedure  effectively  removed  the  few  Instances  of  uncertainty. 

The  increase  in  mean  number  of  correct  spellings  on  the  ST,  as  shown  in 
Table  1,  is  significant  for  both  the  Base  Forms,  F(2,62)  =  26.69,  £  <  .001, 
and  the  Derived  Forms,  F(2,62)  =  34.88,  £  <  .001.  Paired  comparisons  of  the 
group  means  (Scheff^,  £  <  .05)  indicate  that  on  the  Base  Forms  sub  test  the 

fourth  graders  differed  significantly  from  the  eighth  graders,  and  the  sixth 
graders  differed  significantly  from  the  eighth  graders.  On  the  spelling  of 
the  Derived  Forms  the  fourth  graders  differed  significantly  from  the  sixth  and 
eighth  graders,  but  the  sixth  graders  were  not  significantly  different  from 
the  eighth  graders.  The  eighth  graders'  spelling  of  the  base  forms  was  prac¬ 
tically  at  a  celling  level,  although  their  spelling  of  the  derived  words  was 
somewhat  less  proficient. 

As  would  be  expected  from  other  Investigations  of  spelling  skills  (see 
Cahen,  Craun,  &  Johnson,  1971),  the  correlation  between  word  length  and  spel¬ 
ling  errors  and  the  correlation  between  word  frequency  and  spelling  errors 
were  low  to  moderate.  For  both  the  base  words  and  the  derived  words,  the 
correlation  of  word  length  with  spelling  errors  was  .49  (£<.01 ).  The  correla¬ 
tion  of  the  frequency  of  base  words  with  errors  on  base  words  was  -.34,  and 
the  correlation  of  frequency  of  derived  forms  with  errors  on  derived  forms  was 
-.37  (£<.05). 

Developmental  trends  based  on  the  relative  difficulty  of  the  TMS  and  ST 
sub  tests  were  also  found.  On  both  tests,  the  performances  on  the  Base  Forms 
subtests  were  significantly  better  than  the  performances  on  the  Derived  Forms 
subtests:  for  the  ST,  t(64)  -  13.23,  £  <  .001;  for  the  IMS,  t(64)  -  3.90,  £  < 
.001.  The  superior  performance  on  the  Base  Forms  subtests  suggests  that  the 
ability  to  extract  the  base  word  from  its  derived  form  is  developed  before  the 
ability  to  generate  the  derived  form  from  the  base  form.  Similarly,  the  spel¬ 
ling  of  the  base  words  appears  to  be  mastered  before  the  spelling  of  their  de¬ 
rived  counterparts. 

Relationship  Between  Knowledge  of  Derivational  Morphology  and  Spelling  Ability 

The  second  research  question  concerned  the  relationship  between  learning 
derivational  morphology  and  learning  to  spell  derived  words.  In  order  to  de¬ 
termine  the  extent  to  which  performance  on  the  Base  Forms  and  Derived  Forms 
sub  tests  of  the  TMS  and  ST  accounted  for  variance  in  the  performance  at  the 
three  grade  levels,  a  discriminant  function  analysis  was  carried  out.  This 
analysis  generated  one  function  that  accounted  for  94. RJ  of  the  variance 
(Wilks'  Lambda  0.3680109  at  a  significance  level  of  0.0000).  (A  second  func- 
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tion  was  not  significant.)  The  standardized  canonical  coefficients  of  this 
function  are  as  follows:  TMS  Derived  Forms  0.82173:  TMS,  Base  Forms  -0.59577; 
ST,  Derived  Forms  0.89^84;  ST,  Base  Forms  0.01003.  The  particularly  high 
loadings  of  this  function  are  on  the  Derived  Forms  subtests  of  the  TMS  and  ST, 
suggesting  that  knowledge  of  derived  forms  more  strongly  distinguished  the 
three  grade  levels  than  knowledge  of  base  forms.  This  function  correctly 
predicted  the  grade  level  of  69.23?  of  the  group. 

A  second  method  was  used  to  investigate  the  sensitivity  to  morphological 
structure  in  spelling  derived  words-  The  students’  spelling  of  each  word 
pair,  the  base  form  and  its  derived  counterpart,  was  tabulated.  Performance 
on  each  pair  was  figured  according  to  four  possible  patterns:  both  base  and 
derived  forms  incorrect  (e.g. ,  equl  and  equity),  base  correct  but  derived  in¬ 
correct  (e.g.,  begin  but  begginer  for  beginner) ,  base  incorrect  but  derived 
correct  (e.g.,  expens  for  expense  but  expensive) ,  and  both  base  and  derived 
correct  (e.g.,  explain  and  explanation) . 


100 

90 

80 

70 


■  4TH  GRADE 
S  6TH  GRADE 


BASE  AND  ONLY  BASE  ONLY  DERIVED  BASE  AND 
DERIVED  CORRECT  CORRECT  DERIVED 
INCORRECT  CORRECT 


Figure  1.  Comparison  of  correct  and  incorrect  spellings  of  word  pairs,  base 
and  derived  forms,  by  grade  level. 


The  results  of  this  analysis,  shown  in  Figure  1,  give  performance  on 
pairs  of  words  as  a  percentage  of  the  total  possible.  One-way  analysis  of 
variance  showed  that  the  instances  in  which  the  students  were  able  to  spell 
both  the  base  and  derived  words  correctly  increased  significantly  by  grade 
level,  F(2,62)  -  3^.51,  £  <  .001.  Paired  comparisons  of  the  group  means 
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(Scheff^,  £  <  .05)  indicate  that  the  fourth  grade  was  significantly  different 
from  the  sixth  and  eighth  grades  and  that  the  sixth  grade  was  significantly 
different  from  the  eighth  grade. 

Furthermore,  as  is  evident  from  an  examination  of  Figure  1  ,  generally 
speaking,  correct  spelling  of  the  base  word  is  a  precondition  for  correct 
spelling  of  the  derived  word.  While  fourth-  and  sixth-grade  students  quite 
commonly  misspelled  the  derived  word  but  spelled  the  base  word  correctly,  the 
reverse  pattern  was  extremely  uncommon — these  students  very  seldom  misspelled 
the  base  word  and  yet  spelled  the  derived  word  correctly.  The  instances  in 
which  the  base  word  was  correct  but  the  derived  form  was  incorrect  diminish 
markedly  by  the  eighth  grade — an  indication  of  rapid  learning  of  the  spelling 
of  derived  forms  by  this  grade  level. 

Performance  on  TMS  and  ^  as  a  Reflection  of  Word  Type 

The  third  research  question  concerned  the  ruleful  learning  of  derivation¬ 
al  morphology  and  the  extent  to  which  such  knowledge  appears  to  be  used  in 
spelling  derived  words.  To  investigate  this  question,  the  experimental  tests 
included  four  types  of  word  relationships  reflecting  the  kinds  of  transforma¬ 
tions  commonly  found  between  base  and  derived  words.  As  described  earlier, 
these  word  types  are  No  Change  (NC),  Orthographic  Change  (OC),  Phonological 
change  (PC),  and  Both  Change  (BC).  The  premise  was  that  the  more  complex  re¬ 
lationships,  involving  mastery  of  phonological  and  orthographic  rules,  would 
generate  more  errors  than  the  more  transparent  relationships  and  would  be  mas¬ 
tered  somewhat  later.  For  the  TMS,  performances  on  both  the  Base  Forms  and 
Derived  Forms  subtests  showed  a  pattern  of  performance  by  word  type,  in  gener¬ 
al  reflecting  more  difficulty  with  the  relationships  that  required  phonologi¬ 
cal  and/or  both  orthographic  and  phonological  changes,  as  can  be  see  in  Figure 
2. 

In  order  to  determine  the  extent  to  which  the  four  word  types  (No  Change, 
Orthographic  Change,  and  so  on)  of  the  two  TMS  subtests  (Base  Forms  and  De¬ 
rived  Forms)  accounted  for  variance  in  the  performance  at  the  three  grade  lev¬ 
els,  a  discriminant  function  analysis  was  carried  out.  This  analysis  generat¬ 
ed  one  function  that  accounted  for  89.23$  of  the  variance  (Wilks’  Lambda 
0.^i)20^59  at  a  significance  level  of  0.0001).  (The  second  function  was  not 
significant.)  The  standardized  canonical  coefficients  of  this  function,  shown 
in  Table  2,  indicate  that  the  highest  loading  is  on  the  Phonological  Change 
word  type  of  the  Base  Forms  subtest,  with  moderate  loadings  on  most  of  the  re¬ 
maining  word  types  (the  exceptions  being  the  No  Change  and  Orthographic  Change 
word  types  of  the  Base  Forms  test).  This  function  correctly  predicted  the 
grade  level  of  73-85$  of  the  group. 

As  on  the  TMS,  the  students’  spelling  performance  of  the  Derived  Forms 
sub  test  of  the  ST  was  analyzed  by  word  type.  The  question  is  whether  stu¬ 
dents’  success  in  spelling  derived  words  is  a  reflection  of  the  type  of 
transformation  between  the  base  and  derived  form.  (The  words  on  the  Base 
Forms  subtest  cannot  be  analyzed  in  the  same  way,  since  the  dictated  word  is  a 
single  base  morpheme,  and  there  was  nothing  in  the  task  to  encourage  the 
speller  to  consider  morphological  relationships.)  An  examination  of  the  spel¬ 
ling  errors  on  the  four  word  types  of  the  Derived  Forms  subtest  of  the  ST 
(shown  in  Figure  2)  indicated  that  the  mean  number  of  errors  differed  signif¬ 
icantly  by  grade  level;  for  NC,  F(2,62)  »  2*1. 30,  £  <  .001;  for  OC,  F(2,62)  » 
19.36,  £  <  .001;  for  PC,  F(2,62)  -  28.30,  £  <  .001;  for  BC,  F(2,62)  -  50. *17,  £ 
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Table  2 

Standardized  canonical  discriminant  function  coefficients  for  the  word  types 
on  the  Base  Forms  and  Derived  Forms  sub  tests,  TMS 


Base  Forms 

No  Change 

0,0ii983 

Orthographic  Change 

0.05298 

Phonological  Change 

0.63323 

Both  Change 

0.38875 

Derived  Forms 

No  Change 

0.40498 

Orthographic  Change 

-0.26030 

Phonological  Change 

-0.19539 

Both  Change 

0.20406 

8.0 


MEAN  ERRORS  BY  QRAOE  LEVEL 
ON  WORD  TYPES  OF  THREE  EXPERIMENTAL  TASKS 


7.0 

8.0 

5.0 


a  ORAOC  4 
□  QRAOE  0 
■  GRADE  6 
NC  NO  CHANGE 
OC  ORTHOGRAPHIC  CHANGE 
PC  PHONOLOGICAL  CHANGE 
BC  BOTH  CHANGE 


NC  OC  PC  BC  NC  OC  PC  BC  NC  OC  PC  BC 

OPAL  GENERATION  ORAL  GENERATION:  SPELLING: 

BASE  FORMS  DERIVED  FORMS  DERIVED  FORMS 


Figure  2.  Mean  errors  by  grade  level  on  word  types  of  three  experimental 
tasks. 
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<  .001.  Paired  comparisons  (Scheff^,  £  <  .05)  indicated  that  for  each  word 
type  the  fourth  grade  differed  significantly  from  the  sixth  and  eighth  grades, 
and  the  sixth  grade  differed  significantly  from  the  eighth  grade. 

The  two  Derived  Forms  subtests  most  directly  assess  knowledge  of 
transformations  between  base  and  derived  forms.  Consequently,  the  two  Derived 
Forms  Sub  tests,  TMS  and  ST,  were  analyzed  in  order  to  determine  the  extent  to 
which  the  four  word  types  (No  Change,  Orthographic  Change,  and  so  on)  on  the 
two  Derived  Forms  subtests  accounted  for  variance  in  performance  at  the  three 
grade  levels.  A  discriminant  function  analysis  generated  one  function  that 
accounted  for  93.10J  of  the  variance  (Wilks*  Lambda  0.286765^1  at  a  signifi¬ 
cance  level  of  0.0000).  (A  second  function  was  not  significant.)  The  stan¬ 
dardized  canonical  coefficients  of  this  function,  shown  in  Table  3,  indicated 
high  loadings  on  the  Phonological  Change  word  type  of  the  TMS  and  the  Both 
Change  word  type  of  the  ST,  suggesting  that  these  were  particularly  important 
in  accounting  for  the  differences  in  performance  by  grade  level.  Both  draw  on 
knowledge  of  phonological  rules,  whether  for  generating  or  spelling  derived 
forms.  This  function  correctly  predicted  the  grade  level  of  76.92%  of  the 
group. 


Table  3 

Standardized  canonical  discriminant  function  coefficients  for  the  word  types 
on  Derived  Forms  sub  tests,  TMS  and  ST 


No  Change 

-0.07509 

Orthographic 

Change 

0.09502 

Phonological 

Change 

0.43693 

Both  Change 

0.04723 

No  Change 

-0.25244 

Orthographic 

Change 

-0.14684 

Phonological 

Change 

0.17305 

Both  Change 

0.92650 

Analysis  of  Types  of  Errors  on  the  TMS 

Analysis  of  errors  on  the  TMS  provided  further  insight  into  the  mastery 
of  the  rulefulness  of  derivational  morphology.  The  decision  to  analyze  the 
types  of  errors  on  the  TMS  (Derived  Forms  sub  test)  arose  from  the  observation 
of  patterns  among  the  students’  incorrect  responses.  The  errors  fell  natural¬ 
ly  into  four  categories:  BASE  ONLY  for  no  response  other  than  repetition  of 
the  base  word  (e.g.,  sign  for  sign)  ;  RULEFUL  for  ruleful  but  nonexistent  words 
(e.g.,  rev isement  for  revision) ;  UNUSUAL  for  unusual  but  possible  answers 
(e.g.,  healing  instead  of  health  in  response  to  the  item,  "Heal.  His  sister 

was  worried  about  his  _ .");  and  INAPPROPRIATE  for  nonruleful,  nonexistent 

words  (e.g.,  consumeration  for  consumption)  or  for  existing  words  that  were 
inappropriate  answers  (e.g.,  glorify  Instead  of  glorl ous  in  response  to  the 
item,  "Glory.  The  view  from  the  hill  top  was  _ .").  Table  4  shows  the  aver¬ 

age  number  of  errors  in  each  category  made  at  the  three  grade  levels. 
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Table  M 

Mean  errors  (and  SDs)  on  error  types  of  the  Derived  Forms  subtest,  Test  of 
Morphological  Structure,  by  grade  level 


Error  types 


Grade 

Base  Only 

Ruleful 

Unusual 

Inappropriate 

4 

4.7 

2.5 

2.3 

3.5 

(3.3) 

(3.3) 

(1  .6) 

(4.4) 

6 

3.4 

1 .0 

1.5 

1 .8 

(2.9) 

(1.2) 

(1.2) 

(1.5) 

8 

1 .0 

0.7 

1.5 

0.8 

(1.2) 

(0.8) 

(1.1) 

(0.7) 

Analysis 

of  variance 

showed  that 

the  errors  in 

three  of  the  categories 

decreased  significantly  by  grade  level — the  BASE  ONLY  errors,  F(2,62)  «  10.66, 
2  <  .001,  the  RULEFUL  errors,  F(2,62)  -  »1.*l6,  £  <  *05,  and  the  INAPPROPRIATE 
errors,  F(2,62)  -  6.03,  £  <~.01.  The  UNUSUAL  errors  were  not  significantly 

different  by  grade  level.  Further  examination  was  made  of  two  of  the  errors 
types  that  seemed  to  be  of  particular  interest — the  RULEFUL  errors  and  the 
UNUSUAL  errors.  Ninety-one  RULEFUL  errors  (17J  of  the  total)  were  made 
altogether — 60.^X  by  fourth  graders,  24.2  %  by  sixth  graders,  and  15.4?  by 
eighth  graders.  Perhaps  more  revealing  than  the  number  of  errors  is  the 
nature  of  the  RULEFUL  errors.  Eighty-two  percent  of  the  errors  were  made  on 
words  that  undergo  a  phonological  change  (with  or  without  an  acccoipany Ing 
orthographic  change)  in  their  derived  forms  (e.g.,  revise  to  revision) .  For 
97?  of  these  errors,  the  version  given  preserved  the  phonological  Identity  of 
the  base  word  (e.g.,  rev isement  instead  of  the  target  word,  revision) . 

In  addition,  the  students  seemed  to  show  a  preference  for  certain 
suffixes  in  creating  their  RULEFUL  errors.  Most  popular  was  -ment  (accounting 
for  58?  of  the  errors),  followed  by  -ance,  -tlon,  -ness,  and  -less.  All  of 
these  suffixes  were  used  to  create  a  derived  form  without  a  phonological 
change  in  the  base  word.  There  is  no  reason  to  believe  that  the  students  were 
biased  toward  the  use  of  any  particular  suffix  by  the  other  words  on  the  test. 
For  Instance,  the  only  test  item  with  -ment  as  a  suffix  is  the  word  enjoyment. 

Unlike  the  other  error  categories,  the  UNUSUAL  errors  did  not  diminish 
significantly  between  the  fourth  and  eighth  grades.  Analysis  of  the  words  on 
which  such  errors  were  made,  as  well  as  the  kinds  of  responses  given.  Indicate 
that  the  UNUSUAL  errors  occurred  with  the  presentation  of  specific  based  words 
and  sentences.  Most  (80?)  of  these  errors  occurred  in  generating  the  derived 
words  from  the  following  base  words:  warm,  deep,  equal,  active,  consume,  and 
heal.  Four  of  these  six  undergo  a  phonological  change  from  the  base  to  the 
target  derived  form,  and  yet  most  of  the  responses  retained  the  sound  of  the 
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base  word  (e.g.,  healing  instead  of  the  target  word,  health) .  The  responses 
are  unusual  in  that  they  suit  the  sentence  but  were  not  anticipated  as  likely 
answers,  given  the  structure  of  the  sentence.  In  this  respect,  the  UNUSUAL 
answers  are  acceptable  if  not  ideal  and  may  reflect  an  inability  to  associate 
the  base  and  derived  word  forms.  We  cannot  infer  that  the  students  did  not 
know  the  words  health  or  consumption. 

Performance  on  the  Test  of  Suffix  Add  it  ion 

The  students'  mastery  of  orthographic  "rules"  was  examined  by  means  of 
the  Test  of  Suffix  Addition  (TSA).  The  students'  scores  on  the  TSA  consisted 
of  the  number  of  correct  responses.  In  several  instances,  responses  were 
written  with  a  letter  omitted,  substituted,  or  placed  in  the  wrong  order  in  a 
part  of  the  base  word  that  was  not  essential  to  the  suffixation.  Such  answers 
were  not  counted  as  incorrect  if  the  suffix  was  correctly  attached  (e.g., 
belndish  for  b lendish) .  However,  where  the  misoopying  of  a  base  word  in  any 
way  affected  the  addition  of  a  suffix  or  where  the  suffix  itself  was  mis¬ 
spelled,  the  answer  was  counted  as  wrong  (e.g.,  pludding  for  pludylng). 

The  students'  performance  on  the  TSA,  shown  in  Table  1,  indicates 
improvement  in  the  ability  to  add  suffixes  to  nonsense  words,  following  the 
-y,  -e,  and  doubling  rules.  The  scores  also  show  that  even  at  the 
eighth-grade  level,  the  students  have  not  fully  mastered  the  suffix  addition 
rules.  The  difference  between  grade  levels  was  significant,  F(2,62)  =  10.25, 

2  <  .001.  Paired  comparisons  of  the  group  means  (Scheff^,  £  <  .05)  show  that 

the  fourth  and  sixth  grades  were  significantly  different  from  the  eighth 
grade,  but  that  the  fourth  grade  was  not  significantly  different  from  the 
sixth  grade.  The  more  pronounced  growth  appears  to  take  place  between  the 
sixth  and  eighth  grades. 


Discussion 

This  study  set  out  to  investigate  the  knowledge  of  derivational  morpholo¬ 
gy  at  the  fourth-,  sixth-,  and  eighth-grade  levels  and  to  investigate  the  ex¬ 
tent  to  which  this  knowledge  is  reflected  in  the  students'  spelling  of  derived 
words.  The  results  of  the  study  have  shown  that  students  appear  to  learn  a 
great  deal  about  derivational  morphology  between  the  fourth  and  eighth  grades. 
Their  knowledge  reflects  varied  levels  of  understanding  of  the  underlying 
phonological  rules  and  the  orthographic  rules  that  govern  the  transformations 
from  base  to  derived  forms.  In  addition,  there  are  some  indications  that  stu¬ 
dents  learn  to  spell  derived  forms  by  reference  to  morphemic  structure. 
Still,  the  spelling  of  derived  forms  lags  behind  the  knowledge  of  these  forms. 
Even  by  the  eighth  grades,  students  do  not  have  a  full  mastery  of  the  more 
complex  transformations  between  base  and  derived  forms  or  of  the  suffix  addi¬ 
tion  rules. 

Developmental  Trends  in  the  Learning  of  Derivational  Morphology 

Significant  growth  toward  mastery  was  found  on  each  of  the  three  tasks 
that  assessed  morphological  knowledge — the  generation  of  base  and  derived 
forms  and  the  spelling  of  the  derived  forms.  The  test  results  yield  some 
indication  of  the  order  in  which  different  skills  are  acquired.  First,  the 
ability  to  extract  base  forms  from  derived  forms  was  mastered  before  the  abil¬ 
ity  to  generate  derived  forms  from  base  forms.  Second,  the  ability  to  spell 
base  words  was  mastered  before  the  ability  to  spell  their  derived 
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counterparts.  Third,  the  ability  to  produce  the  correct  base  and  derived 
forms  orally  was  generally  superior  at  each  grade  level  to  the  ability  to 
spell  the  base  and  derived  forms.  Finally,  application  of  the  suffix  addition 
rules  was  not  fully  mastered  by  the  eighth  grade,  although  the  students'  abil¬ 
ity  to  apply  these  rules  improved  significantly  between  the  sixth  and  eighth 
grade . 

The  task  of  extracting  the  base  word  (given  the  derived  form  and  an  ap¬ 
propriate  sentence  context)  requires  the  ability  to  analyze  morphemic  struc¬ 
ture,  while  the  task  of  generating  the  correct  derived  form  (given  the  base 
form  and  an  appropriate  sentence  context)  involves  an  awareness  of  the  syntac¬ 
tic  and  semantic  form  suitable  for  a  particular  sentence  context.  This  aware¬ 
ness,  in  turn,  depends  on  a  knowledge  of  the  available  and  acceptable  forms  of 
a  given  word  (such  as  equality  instead  of  equalness).  The  students  differed 
significantly  in  their  proficiency  on  these  two  tasks — the  Base  Forms  and  De¬ 
rived  Forms  sub  tests  of  the  Test  of  Morphological  Structure  (TMS).  It  is 
evidently  easier  to  analyze  the  morphemic  structure  of  derived  forms  than  it 
is  to  produce  an  appropriate  derived  form.  While  the  two  tasks  differ  in  dif¬ 
ficulty,  the  mean  scores  on  both  sub  tests  increase  significantly  by  grade  lev¬ 
el.  Improvement  on  the  Derived  Forms  subtest  was  particularly  dramatic,  as 
the  fourth  graders  had  a  mean  score  of  26.3  correct  (the  maximum  possible  be¬ 
ing  J40),  while  the  eighth  graders  had  a  mean  score  of  35.7  correct  (see  Table 
1).  The  eighth  graders  approached  the  ceiling  level  on  both  subtests  of  the 
TMS,  which  gives  an  indication  of  the  point  at  which  students  become  competent 
at  analyzing  the  morphemic  structure  of  derived  words  and  knowing  the  proper 
word  forma,  given  words  of  this  level  of  difficulty. 

Since  the  words  on  the  two  sub  tests  were  chosen  to  be  equally  familiar, 
we  can  surmise  that  the  particular  source  of  difficulty  in  learning  deriva¬ 
tional  morphology  is  less  learning  to  analyze  morphemic  structure  of  derived 
words  than  learning  appropriate  derived  word  forms.  One  important  aspect  of 
this  contrast  may  be  the  different  demands  each  of  the  tasks  makes  on  an 
individual.  It  is  likely  that  production  of  a  word  form  is  more  taxing  than 
analysis  of  the  structure  of  a  given  word.  However,  this  general  observation 
needs  to  be  examined  in  regard  to  individual  differences  in  performance.  For 
individuals  who  have  trouble  understanding  the  morphemic  structure  of  words 
(Wiig,  Semel,  &  Crouse,  1973),  the  two  tasks  might  be  equally  challenging. 

Learning  to  Spell  Derived  Forms 

Comparisons  of  the  students'  performances  on  the  TMS  and  the  Spelling 
Test  (ST)  confirm  our  expectation  that  spelling  is  a  more  difficult  taSk  than 
orally  generating  word  forma.  It  is  not  surprising  that  skill  in  spelling  de¬ 
rived  forms  appears  to  develop  later  than  skill  in  generating  derived  forms. 

Performances  on  the  ST  suggest  that  spelling  derived  forms  draws  on  a 
knowledge  of  morphological  relationships.  When  spelling  performances  on  each 
word  pair  (the  base  and  its  derived  form)  were  analyzed,  mastery  of  the  spel¬ 
ling  of  derived  forma  seemed  to  depend  on  initial  learning  of  the  spelling  of 
the  base  forms.  As  Figure  1  shows,  students  very  seldom  spelled  a  derived 
form  correctly  when  they  spelled  the  base  form  incorrectly,  whereas  they  quite 
commonly  misspelled  the  derived  form  when  they  had  spelled  the  base  form 
correctly.  It  is  unlikely  that  this  pattern  would  be  so  pronounced  if  derived 
words  were  learned  as  unanalyzed  whole  words.  In  addition,  there  is  a  rela¬ 
tively  small  decrease  in  the  percentage  of  the  instances  in  which  the  base  is 
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correct  but  the  derived  form  is  incorrect  (29?  to  11?);  in  contrast,  there  is 
a  very  large  increase  in  the  percentage  of  instances  in  which  both  the  base 
and  derived  forms  are  spelled  correctly  (33?  to  8??).  This  suggests  a  rapid 
improvement  in  the  ability  to  manage  the  "derived"  part  of  the  derived  forms 
(including  orthographic  and  phonological  transformations),  along  with  an 
improvement  in  the  ability  to  spell  the  base  forms. 

Another  indication  of  the  use  of  morphemic  analysis  in  spelling  of  de¬ 
rived  forms  comes  from  the  students’  performance  on  the  Test  of  Suffix  Addi¬ 
tion.  Since  this  test  involves  adding  suffixes  correctly  to  nonsense  words, 
it  requires  explicit  knowledge  of  specific  suffix  conventions  (tliose  governing 
the  addition  of  suffixes  to  words  ending  in  a  silent  e,  in  and  in  a  single 
consonant).  These  suffix  rules,  which  differ  from  the  linguistic  rules  that 
govern  phonological  and  orthographic  transformations,  are  appropriately  viewed 
as  conventions  of  writing  that  govern  the  correct  spelling  of  both  inflected 
and  derived  forms.  They  are  most  likely  learned  by  observation  of  the  pat¬ 
terns  of  suffix  addition  in  the  orthography  or  by  direct  instruction  in 
school.  (The  alternative  would  be  memorization  of  the  sequence  of  letters 
used  to  spell  each  derived  word,  an  unwieldy  system,  given  the  large  number  of 
derived  words  the  students  are  learning  and  can  use  in  their  writing.)  The 
students'  performances  on  this  test  show  an  improvement  in  the  mastery  of  the 
three  suffix  rules,  particularly  between  the  sixth  and  eighth  grades.  (The 
fourth  graders'  performance  was  not  significantly  different  from  that  of  the 
sixth  graders.)  Still,  even  the  eighth  graders  had  not  mastered  the  rules 
completely.  As  was  anticipated,  the  learning  of  these  suffix  addition  rules 
seems  to  take  place  later  than  the  mastery  of  the  morphological  structure  of 
words. 

Ruleful  Learning  of  Derivational  Morphology 

The  words  on  the  TMS  represent  four  types  of  transformations  between  base 
and  derived  word  forms.  These  are:  NO  CHANGE  in  the  orthography  and  phonolo¬ 
gy  (e.g.,  enjoy  to  enjoyment) ,  ORTHOGRAPHIC  CHANGE  only  (e. g. ,  rely  to  reli¬ 
able)  ,  PHONOLOGICAL  CHANGE  only  (e.g.,  major  to  majority),  and  BOTH  CHANGE, 
the  orthography  and  the  phonology  (e.g.,  reduce  to  reduction).  The  NO  CHANGE 
word  type  represents  the  most  transparent  relationship,  while  the  BOTH  CHANGE 
word  type  represents  the  most  obscure  relationship.  Analysis  of  the  test  re¬ 
sults  suggests  that  the  nature  of  the  transformation  between  base  and  derived 
forms  affected  the  accessibility  of  knowledge  of  morphological  relatedness,  as 
was  expected  (see  Figure  2).  On  the  Base  Forms  part  of  the  TMS  the  NO  CHANGE 
words  had  the  fewest  errors  and  the  BOTH  CHANGE  words  had  the  most,  while 
PHONOLOGICAL  CHANGE  words  and  ORTHOGRAPHIC  CHANGE  words  fall  between  these  two 
extremes.  On  the  Derived  Forms  subtest  the  PHONOLOGICAL  CHANGE  and  BOTH 
CHANGE  words  gave  much  more  difficulty  than  the  NO  CHANGE  and  ORTHOGRAPHIC 
CHANGE  words. 

A  discriminant  function  analysis  of  the  four  word  types  on  the  two  sub¬ 
tests  of  the  TMS  (Base  Forms  and  Derived  Forms)  yielded  one  significant  func¬ 
tion  that  accounted  for  over  89?  of  the  variance.  Indicating  the  power  of 
these  variables  in  distinguishing  the  students  at  the  three  grade  levels. 
Contributing  to  the  power  of  this  function  were  all  of  the  word  types  except 
the  NO  CHANGE  and  ORTHOGRAPHIC  CHANGE  types  on  the  Base  Forms  subtest,  possi¬ 
bly  indicating  that  general  mastery  of  the  system  of  transformations  distin¬ 
guished  the  three  grade  levels. 
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Performance  by  word  type  was  considered  a  particularly  important  indica¬ 
tion  of  ruleful  learning  on  the  two  Derived  Forms  subtests — the  oral  genera¬ 
tion  task  (Derived  Forms  subtest  of  the  IMS)  and  spelling  (Derived  Forms  sub¬ 
test  of  the  ST).  The  four  word  types  on  these  subtests  were  included  in  a 
discriminant  function  analysis.  This  analysis  yielded  one  significant  func¬ 
tion  that  accounted  for  93?  of  the  variance.  Of  the  standardized  canonical 
coefficients,  the  heaviest  loading  was  on  the  BOTH  CHANGE  word  types  of  the 
ST,  the  second  strongest  contributor  being  the  PHONOLOGICAL  CHANGE  word  type 
of  the  TMS.  These  results  suggest  that  mastery  of  both  phonological  and 
orthographic  rules  in  spelling  most  strongly  distinguishes  the  grade  levels. 
Knowledge  of  the  underlying  phonological  rule  system  also  discriminates  the 
three  grade  levels  in  performance  on  derived  words,  whether  the  task  be  oral 
generation  or  spelling. 

Despite  these  findings,  performance  on  the  Derived  Forms  subtest  of  the 
Spelling  Test  shows  that  the  distribution  of  errors  by  word  type  is  relatively 
even,  a  pattern  evident  at  all  three  grade  levels  (see  Figure  2).  There  are 
several  possible  reasons  for  the  modest  effect  by  word  type  in  spelling. 
First,  the  two  tasks  (oral  generation  and  dictated  spelling)  are  very  differ¬ 
ent  in  one  important  respect.  In  generating  derived  forms,  the  student  had  no 
choice  but  to  work  with  the  morphemic  structure  of  the  word.  However,  in 
spelling  the  derived  forms,  the  students  were  given  the  derived  word  by  dicta¬ 
tion,  and  so  the  task  did  not  require  them  to  deal  with  the  word's  morphemic 
structure.  The  fact  that  there  is  any  consistency  in  the  effects  of  word  type 
by  grade  level  suggests  that  some  knowledge  of  orthographic  and  phonological 
transformations,  at  least,  plays  a  role  In  the  process  of  spelling  derived 
forms.  Second,  spelling  is  a  complex  skill,  offering  many  opportunities  for 
error.  Clearly,  the  difficulty  of  spelling  a  derived  word  is  not  simply  a 
reflection  of  its  word  type. 

Finally,  analysis  of  the  kinds  of  errors  students  made  on  the  Derived 
Forms  of  the  TMS  gives  additional  support  to  the  argument  that  the  nature  of 
the  transformations  between  base  and  derived  forms  affects  the  ease  of  master¬ 
ing  morphological  relationships.  The  two  error  types  selected  for  detailed 
analysis  (the  RULEFUL  errors  and  the  UNUSUAL  errors)  were  found  to  fall 
primarily  on  those  words  that  belonged  to  the  PHONOLOGICAL  CHANGE  or  BOTH 
CHANGE  word  types.  The  students'  most  common  error  was  a  form  of  the  word 
that  retained  the  sound  of  the  base  word,  whether  the  response  was  an  actual 
word  or  a  ruleful  invention.  This  pattern  suggests  that  the  younger  students 
know  something  about  the  system  of  forming  derivatives  but  have  not  yet 
learned  all  of  the  appropriate  phonological  changes.  In  fact,  a  large  propor¬ 
tion  of  their  errors  showed  a  resistance  to  making  phonological  changes  in 
giving  derived  forms.  The  students  often  simply  added  one  of  the  more  common 
and  familiar  suffixes  ("all-purpose"  suffixes  such  as  -ment)  to  the  base  word. 
For  example,  a  number  of  students  spontaneously  invented  the  form  producement, 
not  knowing  or  not  recognizing  the  morphological  relationship  of  the  correct 
response,  production. 

The  RULEFUL  and  the  UNUSUAL  errors  were  in  many  respects  quite  similar; 
the  UNUSUAL  errors  were  differentiated  primarily  because  they  were  existing 
English  words,  while  the  respoi'ses  that  made  up  the  RULEFUL  errors  could  be 
English  words  but  for  whatever  reason  are  not — such  is  the  complexity  of 
derivational  morphology.  Thus,  even  the  students  who  do  not  have  a  complete 
understanding  of  the  complex  transformations  still  understand  some  basic 
principles  about  how  the  system  of  derivational  morphology  works. 
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Instructional  Implications 

This  study  provides  some  evidence  that  sixth-  and  eighth-grade  students 
draw  on  their  understanding  of  the  morphemic  structure  of  the  words  to  guide 
their  spellings  of  derived  words.  First,  there  was  a  strong  relationship  be¬ 
tween  correct  spelling  of  the  base  words  and  correct  spelling  of  their  derived 
counterparts.  We  might  surmise  that  along  with  learning  how  to  spell  the  base 
words,  the  students  are  acquiring  morphological  awareness — a  sensitivity  to 
word  relationships  and  an  inclination  to  use  knowledge  of  morphological  rela¬ 
tionships  in  spelling.  Second,  the  students  demonstrated  improved  ability  to 
apply  the  orthographic  rules  that  govern  suffix  addition,  indicating  that  gen¬ 
eral  principles  are  learned  and  applied  to  the  spelling  of  derivatives. 
Nonetheless,  the  spelling  of  derived  words  lagged  behind  mastery  of  the  system 
of  the  transformations  between  base  and  derived  forms.  The  test  results  sug¬ 
gest  that  although  a  student  may  demonstrate  an  understanding  of  morphemic 
structure  when  asked  to  analyze  words,  he  or  she  may  not  put  this  knowledge  to 
use  on  a  dictated  spelling  test  of  derived  words,  particularly  where  there  are 
phonological  and  orthographic  transformations. 

Since  the  students  demonstrate  some  productive  knowledge  of  derivational 
morphology,  they  have  the  potential,  given  suitable  instruction,  to  develop  an 
explicit  awareness  of  the  relationship  between  the  word  forms  and  their  spel¬ 
lings.  However,  even  the  eighth  graders  still  have  not  mastered  the  spelling 
of  PHONOLOGICAL  CHANGE  and  BOTH  CHANGE  derived  words  and  the  suffix  addition 
rules.  It  seems  likely  that  students  in  the  fourth  through  eighth  grades 
might  benefit  by  spelling  instruction  that  explicitly  emphasizes  morphological 
relationships  and  the  principles  that  govern  the  addition  of  suffixes.  One 

training  study  has  been  done  that  suggests  the  particular  benefits  of  a 

morphemically-based  spelling  program  (Robinson  &  Hesse,  1981).  The  sev¬ 
enth-grade  students  who  received  training  in  the  morphemic  structure  of  words 
showed  more  Improvement  than  a  control  group  in  general  spelling  performance 
and  in  specific  performance  on  morphemically  complex  words. 

Poor  spellers  and  learning-disabled  students,  who  have  been  found  to  be 
deficient  in  their  understanding  of  morphological  rules  (Wiig  et  al.,  1973). 
might  benefit  particularly  from  Intensive  and  explicit  instruction  in  the 
morphemic  structure  of  words.  In  a  school  system  whose  spelling  program  in¬ 
cludes  instruction  in  morphemic  analysis  and  spelling  rules,  both  good  and 

poor  spellers  showed  gradual  improvement  in  their  spelling  of  words  with 

suffixes  between  the  seventh,  eighth,  and  ninth  grades  (Carlisle,  1984).  How¬ 
ever,  the  poor  spellers  continued  to  lag  well  behind  their  peers.  They  seem 
to  need  more  intensive  instruction  over  a  longer  period  of  time  to  make  sig¬ 
nificant  improvement  in  their  ability  to  spell  words  with  suffixes. 

Explicit  instruction  in  morphological  relationships,  including  phonologi¬ 
cal  and  orthographic  transformations,  might  enhance  both  the  students'  under¬ 
standing  of  the  structure  of  the  language  and  their  ability  to  spell  derived 
words.  Such  instruction  could  commence  at  the  fourth-grade  level. 
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Appendix 

Test  of  Morphological  Structure 


Derived  Forms 

Subtest: 

Base  Forms  Sub  test: 

Given 

(Target  Response) 

Given 

(Target  Response) 

1 . 

warm 

( warmth) 

growth 

(grow) 

2. 

enj  oy 

( enjoyment) 

employment 

( emp loy ) 

U  • 

appear 

( appearance) 

difference 

(differ) 

care 

( careful) 

fearful 

( fear) 

No 

5. 

final 

( finally) 

usually 

( usual) 

Change 

6. 

profit 

( profitable) 

remarkable 

( remark) 

7. 

per  form 

( performance) 

assistance 

( assist) 

8. 

humor 

( humorous) 

dangerous 

( danger) 

9. 

honest 

( honesty) 

royalty 

( royal) 

10. 

precise 

( precisely) 

extremely 

( extreme) 

1 . 

sun 

( sunny) 

foggy 

( fog) 

2. 

swim 

( swimmer) 

runner 

( run) 

3. 

begin 

(beginner) 

propeller 

(propel) 

endure 

( endurance) 

guidance 

(guide) 

Orthographic 

5. 

active 

(activity) 

density 

( dense) 

Change 

6. 

adventure 

!  (adventurous) 

continuous 

( continue) 

7. 

expense 

( expensive) 

sensitive 

( sense) 

8. 

happy 

(happiness) 

emptiness 

(empty) 

9. 

glory 

(glorious) 

furious 

( fury) 

10. 

rely 

( reliable) 

variable 

(vary) 

1 . 

equal 

( equality) 

humanity 

( human) 

2. 

original 

(originality) 

personality 

(personal) 

3. 

drama 

( dramatic) 

periodic 

(period) 

11. 

magic 

(magician) 

musician 

(music) 

Phonological 

5. 

protect 

(protection) 

election 

(elect) 

Change 

6. 

express 

( expression) 

discussion 

( discuss) 

7. 

electric 

(electricity) 

publicity 

(public) 

8. 

sign 

( signal) 

national 

( nation) 

9. 

major 

( majority) 

popularity 

(popular) 

10. 

heal 

( health) 

cleanly 

( clean) 

1 . 

deep 

( depth) 

width 

( wide) 

2. 

type 

( typical) 

athletic 

( athlete) 

3. 

explain 

( explanation) 

combination 

( combine) 

11. 

produce 

(production) 

reduction 

( reduce) 

Both 

5. 

permit 

( permission) 

admission 

( admit) 

Change 

6. 

expand 

( expansion) 

extension 

( extend) 

7. 

absorb 

(absorption) 

description 

( describe) 

8. 

revise 

( revision) 

recognition 

( recognize) 

9. 

decide 

( decision) 

division 

(divid<^) 

10. 

consume 

( consumption) 

assumption 

(assume) 

RELATIONS  AMONG  REGULAR  AND  IRREGULAR,  MORPHOLOGICALLY-RELATED  WORDS  IN  THE 
LEXICON  AS  REVEALED  BY  REPETITION  PRIMING* 


Carol  A.  Fowler, t  Shirley  E.  Napps,tt  and  Laurie  B.  Feldmanttt 


Abstract.  Several  experiments  examined  repetition  priming  among 
morphologically  related  words  as  a  tool  to  study  lexical  organiza¬ 
tion.  The  first  experiment  replicated  a  finding  by  Stanners, 
Neiser,  Hernon,  and  Hall  (1979)  that  whereas  inflected  words  prime 
their  unaffixed  morphological  relatives  as  effectively  as  do  the 

unaffixed  forms  themselves,  derived  words  are  effective,  but  weaker, 
primes.  The  experiment  also  suggested,  however,  that  this  differ¬ 
ence  in  priming  may  have  an  episodic  origin  relating  to  the  less 
formal  similarity  of  derived  than  of  inflected  words  to  unaffixed 
morphological  relatives.  A  second  experiment  reduced  episodic 

contributions  to  priming  and  found  equally  effective  priming  of 

unaffixed  words  by  themselves,  by  Inflected  relatives,  and  by  de¬ 
rived  relatives.  Two  additional  experiments  found  strong  priming 
among  relatives  sharing  the  spelling  and  pronunciation  of  the 

unaffixed  stem  morpheme,  sharing  spelling  alone  or  sharing  neither 
formal  property  exactly.  Overall,  results  were  similar  with  ai  "'i- 
tory  and  visual  presentations.  Interpretations  that  repetition 
priming  reflects  either  repeated  access  to  a  common  lexical  entry  or 
associative  semantic  priming  are  both  rejected  in  favor  of  a  lexical 
organization  in  which  components  of  a  word  (e. g.,  a  stem  morpheme) 
may  be  shared  among  distinct  words  without  the  words  themselves,  in 
any  sense,  sharing  a  "lexical  entry." 

Words  presented  for  lexical  decision  are  more  rapidly  classified,  and 
words  presented  under  poor  viewing  or  listening  conditions  are  more  readily 
reported,  if  they  have  been  presented  previously  in  the  experimental  setting 
than  if  they  have  not  (e.g.,  Forbach,  Stanners,  &  Hochhaus,  197^;  Murrell  & 
Morton,  197^;  Scarborough,  Cortese,  &  Scarborough,  1977).  We  will  refer  to 
this  general  outcome  as  "repetition  priming."  Morton  (e.g.,  1981)  and 
Stanners,  Neisser,  Hernon,  and  Hall  (1979)  have  interpreted  repetition  priming 
as  a  consequence  of  repeated  access  to  a  lexical  entry.  Other  research  has 
identified  both  episodic  (Feustel,  Shiffrin,  &  Salasoo,  1983;  Jacoby  &  Dallas, 
1981)  and  strategic  (Forster  &  Davis,  198H;  Oliphant,  1983)  components  to  the 
priming  effect  as  well. 


* Memory  &  Cognition,  1 3 ,  241-255. 
tAlso  Dartmouth  College. 
ttDartmouth  College. 
tttAlso  University  of  Delaware. 
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The  lexical  interpretation  is  of  particular  interest  in  light  of  patterns 
of  priming  that  are  observed  among  morphologically-related  words.  Priming  may 
occur  in  two  forms  that  we  refer  to  as  "full"  and  "partial."  Full  priming  is 
priming  of  one  word  by  another  that  is  as  large,  statistically,  as  priming  of 
a  word  by  itself.  Partial  priming  is  priming  of  one  word  by  another  that  is 
present,  statistically,  but  is  significantly  less  than  priming  of  a  word  by 
itself.  Generally,  the  findings  are  that  priming  of  a  base  word  by  regularly 
inflected  morphological  relatives  is  full,  while  priming  by  derived  forms  is 
partial  (Stanners  et  al.,  1979).  Priming  by  irregularly  affixed  words  may  be 
partial  (Stanners  et  al.,  1979)  or  absent  (Kempley  &  Morton,  1982). 

Stanners  et  al.  interpret  full  priming  as  evidence  that  stem  forms  and 
inflected  relatives  share  a  lexical  entry;  they  interpret  partial  priming  as 
evidence  that  stem  forms  and  derived  words  are  neighbors  in  the  lexicon.  This 
pattern  of  priming  and  its  interpretation  are  appealing  in  supporting  plausi¬ 
ble  roles  for  lexical  entries  in  language  use.  One  role  has  repetition  prim¬ 
ing  as  a  by-product;  a  second  role  gives  repetition  priming  its  patterning. 

In  Morton's  theory  of  the  lexicon  (1969,  1981),  lexical  entries  are 

"logogens"  which  collect  evidence  for  the  occurrence  in  stimulation  of  the 
words  they  represent.  Sufficient  evidence,  exceeding  a  logogen's  threshold, 
causes  the  logogen  to  "fire."  As  one  consequence  of  firing,  the  threshold  is 
lowered  temporarily  so  that  less  evidence  is  necessary  for  firing  if  the  word 
is  presented  a  second  time.  The  threshold  rises  very  slowly  over  time. 
Thresholds  of  frequent  words  are  kept  permanently  lowered  by  the  frequent 
recurrence  of  the  words  in  stimulation.  The  frequency-sensitive  thresholds  of 
logogens  explain  repetition  priming,  but  more  usefully  for  language  users, 
they  prepare  language  users  for  perception  of  words  most  likely  to  occur  in 
the  environment.  In  this  role,  repetition  priming  is  a  by-product  of  the  nor¬ 
mal  operation  of  the  logogen  system. 

Arguably,  this  mechanism  would  work  well  if,  as  the  repetition  priming 
data  suggest,  the  lexicon  counted  a  stem  morpheme  and  its  regularly  inflected, 
but  not  derived,  forms  as  the  same  word.  Unafflxed  words  and  their  inflected 
relatives  are  the  same  part  of  speech  with  essentially  the  same  core  meaning; 
in  a  sense  they  are  the  same  word  with  the  difference  between  them  determined 
by  the  grammatical  context  in  which  the  word  appears.  Consequently  a  common 
frequency-based  expectancy  is  meaningful  for  classes  of  words  differing  only 
in  inflectional  affix.  In  contrast,  unaffixed  words  and  their  derived  rela¬ 
tives  often  are  not  the  same  part  of  speech,  they  need  not  be  close  in  meaning 
(cf.  Aronoff,  1976),  and,  consequently,  a  common  frequency-based  expectancy 
for  unafflxed  words  and  their  derived  relatives  would  not  be  meaningful. 

The  second  role  for  a  lexical  entry  may  be  in  providing  apprqarlate  input 
to  regular  and  productive  phonological  rules  of  the  language.  In  generative 
phonology  (Chomsky  &  Halle,  1968),  a  lexical  entry  includes  Just  that  phono¬ 
logical  information  about  a  word  that  is  not  predictable  by  rule,  and  hence 
that  uniquely  identifies  a  word.  The  phonological  rules  that  are  most 
productive  and  regular  in  English  (and  thus,  perhaps,  that  are  most  likely  to 
be  learned  by  language  users  [cf. Berko,  1956;  Ohala,  1974;  Steinberg,  1973]) 
are  rules  of  inflection.  The  finding  that  Inflected  words  prime  their  stems 
fully,  then,  is  consistent  with  a  lexicon  in  which  inflected  words  have  no  in¬ 
dependent  representation.  Certain  speech  errors  (for  example,  morpheme  shifts 
and  strandings  [Garrett,  1980a,  1980b])  have  been  interpreted  as  supporting  a 
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similar  conclusion;  so  have  the  speech  patterns  of  some  Broca's  and  jargon 
aphasics  (see  Butterworth,  1983,  for  a  review  of  the  relevant  evidence). 

Despite  the  consistent  and  plausible  view  of  the  lexicon  provided  by 
repetition-priming  findings,  we  decided  to  investigate  priming  patterns  furth¬ 
er  for  two  reasons.  The  first  reason  is  that  repetition  effects  are  found  in 
the  memory  literature  (e.g.,  Light  &  Cartel — Sobell,  1970)  in  which  they  are 
ascribed  to  episodic,  not  to  lexical  memory,  and,  episodic  sources  of  priming 
are  found  using  paradigms  very  similar  to  the  repetition  priming  paradigms 
themselves  (Feustel  et  al.,  1983). 

Moreover,  it  is  not  difficult  to  imagine  how  episodic  influences  might 
contribute  to  priming  using  the  procedures  of  Morton  or  of  Stanners  et  al. 
Subjects  may  explicitly  recall  having  seen  a  word  (or  morphological  relative) 
previously  in  the  experiment,  and  in  the  procedure  of  Stanners  et  al.,  they 
may  recall  the  response  they  made  to  it.  This  recollection  may  facilitate 
responding  to  a  primed  word.  These  episodic  sources  of  priming  are  unlikely 
to  exhaust  the  repetition  priming  that  occurs  (cf.  Jacoby  &  Dallas,  1981); 
however,  added  to  lexical  sources  of  priming,  they  are  likely  to  exaggerate 
the  apparent  loss  In  priming  of  an  unaffixed  form  by  a  derived  form  as  can- 
pared  to  its  priming  by  an  inflected  form  or  by  itself.  This  exaggerated 
difference  would  occur  because  derived  forms  generally  are  less  formally  or 
semantically  similar  to  stem  forms  than  are  inflected  forms  (and,  of  course, 
than  is  a  stem  word  to  itself).  Consequently,  memory  for  a  derived  prime  may 
be  less  likely  to  be  cued  during  later  presentation  of  the  unaffixed  word  than 
memory  for  an  uninflected  prime  or  for  the  target  word  itself.  Accordingly, 
full  priming  between  a  word  and  itself  or  between  a  word  and  an  inflected 
variant  may  include  both  lexical  and  episodic  sources  of  priming,  whereas  par¬ 
tial  priming  as  between  a  derived  prime  and  unaffixed  target  may  include  only 
lexical  sources  of  priming. 

Our  second  reason  to  explore  further  the  patterning  of  repetition  priming 
derives  from  questions  raised  about  any  repetition  priming  having  a  lexical 
rather  than  an  episodic  origin.  The  main  question  is  whether  repetition  prim¬ 
ing  originating  in  the  lexicon  always  reflects  repeated  access  to  a  common 
lexical  entry.  In  a  recent  review  of  the  literature  on  word  recognition, 
speech  errors,  and  the  speech  and  reading  patterns  of  various  language-dis¬ 
abled  populations,  Butterworth  (1983)  disputes  the  conclusion  that  lexical 
entries  are  common  to  unaffixed  words  and  their  affixed  relatives  in  English. 
Instead,  in  his  view,  the  bulk  of  evidence  supports  separate  but  associated 
entries  for  all  words.  If  this  interpretation  is  correct,  then  repetition 
priming  may  occur  between  separate  entries  in  the  lexicon.  Our  research  in¬ 
vestigates  the  distinction  between  shared  lexical  entries  for  morphological 
relatives  and  associated,  but  separate  entries. 


Our  first  experiment  was  designed  to  test  for  episodic  sources  of  influ¬ 
ence  on  repetition  priming.  Having  found  it,  we  take  steps  in  later  experi¬ 
ments  to  reduce  or  eliminate  it  and  to  reexamine  the  pattern  of  repetition 
priming  among  stems  and  regularly  and  irregularly  inflected  and  derived 
morphological  relatives.  This  patterning  suggests  hypotheses  concerning  the 
organization  of  morphologically  related  words  in  the  lexicon. 
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Experiment  1 

As  an  index  of  episodic  priming,  we  chose  to  look  at  repetition  priming 
on  nonwords — both  regular  and  irregular.  The  literature  does  not  offer  a 
clear  indication  of  whether  nonword  repetition  priming  should  be  found  using  a 
lexical-decision  paradigm.  Forbach  et  al.  (1974)  report  essentially  no 
repetition  priming  among  nonwords;  however,  Scarborough  et  al.  (1977)  report 
some  priming  of  this  type.  Stanners  et  al.  (1979)  do  not  report  their  find¬ 
ings  on  nonwords. 

In  the  present  experiment,  we  examine  repetition  priming  among  words  and 
nonwords  under  conditions  replicating  those  in  which  Stanners  et  al.  found 
full  repetition  priming  of  base  forms  by  inflected  morphological  relatives  and 
partial  priming  by  derived  forms. 

Method 

Subjects.  Subjects  were  25  Dartmouth  College  undergraduates  who 
participated  in  the  experiment  for  course  credit.  All  were  native  speakers  of 
English  and  had  normal  or  corrected  vision. 

Stimulus  materials.  The  stimuli  used  in  the  experiment  were  48  English 
words  and  48  nonwords.  The  words  formed  two  groups.  One  group  (Inflections 

Only)  was  presented  both  without  suffixes,  called  "base"  stimuli,  and  with 

Inflectional  suffixes,  "s"  and  "ed."  The  second  group  (Derivations  and 
Inflections)  appeared  as  base  stimuli,  with  the  inflectional  suffixes  "s"  and 
"ed,"  and  with  two  derivational  suffixes  ("ment"  and  one  of  "er'V'or"  or 

"able"/"ible") .  Thus,  within  the  second  group,  the  effects  of  inflectional 

and  derivational  forms  of  the  same  word  can  be  compared  with  each  other. 
Words  were  chosen  so  that  sufflxation  did  not  change  the  spelling  or 
pronunciation  of  the  base. 

Nonwords  formed  three  groups.  Items  in  the  first  group  (Nonword, 
Inflections  Only)  were  created  fron  real  words  having  the  same  characteristics 
as  the  real  words  in  the  Inflections  Only  group.  To  form  the  nonwords,  one  or 
two  letters  in  the  real  words  were  changed.  The  resulting  nonwords  were 
orthographically  regular.  These  were  presented  both  in  a  base  form  and  with 
Inflectional  suffixes.  Thus,  they  are  the  nonword  counterparts  of  the  first 
group  of  real  words.  The  second  group  (Irregular,  Inflections  Only),  consist¬ 
ed  of  ten  Irregular  four-letter  constructions  and  these  were  also  presented 
both  as  base  forms  and  with  inflectional  suffixes,  "s"  and  "ed."  Irregular 
nonwords  were  Included  in  the  study  to  provide  an  index  of  episodic  priming  in 
nonwords  presumed  not  to  have  any  form  of  representation  in  the  lexicon.  The 
third  group  of  nonwords  (Nonword  Derivations  and  Inflections)  were  analogous 
to  the  second  group  of  real  words.  They  were  orthographically  regular  and 
were  presented  as  base  forms,  with  inflectional  suffixes,  "s"  and  "ed,"  and 
with  the  derivational  suffixes,  "ment,"  "er'V'or"  or  "able"/"ible. "  The  words 
used  in  the  experiment  and  the  words  from  which  the  38  regular  nonwords  were 
formed  were  equated  on  average  length  and  on  mean  and  median  frequency  (KuCera 
&  Francis,  1967).  Real-word  base  forms  are  listed  in  Appendix  A. 

Five  test  orders  were  created,  each  one  including  the  following  priming 
conditions  in  equal  numbers;  (1)  base  as  target  with  no  prime  (henceforth 
B1),  (2)  base  as  prime  and  base  as  target  (BB;  e.g. ,  "manage"-"manage") ,  (3) 
inflection  as  prime  and  base  as  target  (IB;  e.g.,  "manages"-"manage") ,  and  (4) 
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derivation  as  prime  and  base  as  target  (DB;  e.g.,  "managemen t "-"manage") . 
Across  test  orders,  items  appeared  in  identical  serial  positions,  but  the  se¬ 
quences  differed  in  which  version  of  each  item  served  as  a  prime.  For  exam¬ 
ple,  for  the  base  word  "manage,"  the  forms  "manage,"  "manages,"  "managed," 

"management,"  and  "manager"  served  as  primes  in  the  different  test  sequences. 

In  all  sequences,  the  target  was  "manage".  For  items  occurring  only  as  base 
forms  and  inflections  in  the  experiment,  each  inflected  form  (i.e. ,  "s,"  "ed") 
occurred  in  two  test  sequences  and  the  base  form  in  one  as  primes.  Inflec¬ 
tions,  derivations  and  base  first  occurrences  were  distributed  proportionately 
over  the  five  test  sequences. 

Subjects  saw  each  morpheme  only  twice:  once  as  a  prime  and  once  as  a 

target.  The  average  lag  between  the  occurrence  of  a  prime  and  the  occurrence 

of  its  target  was  nine  intervening  trials;  lags  ranged  from  6  to  12  and  each 
lag  was  equally  frequent  among  words  and  nonwords.  Filler  items  were  used  as 
necessary  to  maintain  appropriate  lags.  Each  subject  completed  five  blocks  of 
56  trials  each,  the  first  of  which  was  a  block  of  practice  trials.  Primes  and 
targets  were  presented  within  one  block. 

Design.  Five  subjects  were  assigned  to  each  of  the  five  test  orders. 
The  independent  variables  were  Priming  Condition  and  Lexical  Status  (word, 
nonword).  The  main  dependent  variable  was  response  time. 

Procedure.  Subjects  were  run  Individually.  The  experiment  was  run  on  a 
time-sharing  computer  interfaced  with  a  Polytronics  response  timer.  The  sti¬ 
muli  were  presented  in  upper  case  on  a  cathode  ray  tube.  On  each  trial  the 
following  sequence  of  events  occurred:  (1  )  a  fixation  string  of  plus  signs 
(  +++  +++++)  came  on;  (2)  the  terminal  bell  sounded  500  ms  before  the  fixation 
mark  went  off;  (3)  a  letter-string  appeared  as  soon  as  the  fixation  mark  dis¬ 
appeared,  and  remained  on  until  the  subject  responded;  (4)  once  the  subject 
responded  and  the  stimulus  disappeared,  the  fixation  mark  returned  and  another 
trial  began. 

For  each  subject,  the  "K"  key  of  the  computer  terminal  was  pressed  with 
the  right  index  finger  for  a  word  stimulus  and  the  "D"  key  with  the  left  index 
finger  for  a  nonword  stimulus.  The  keys  were  labeled  with  the  symbols  "W"  and 
"NW"  for  "word"  and  "nonword"  respectively.  Subjects  were  informed  that  both 
accuracy  and  speed  of  responding  were  important,  and  that  accuracy  should  be 
kept  above  90j6  correct  on  each  block  of  trials. 

Between  blocks  of  trials  subjects  were  informed  of  their  mean  reaction 
times  and  proportions  correct  for  the  preceding  block  of  trials.  Blocks  were 
initiated  by  the  subject. 

Results* 

Errors  and  extreme  reaction  times  (greater  than  2000  ms  or  more  than  2.5 
standard  deviations  from  the  individual  subject’s  or  item's  mean)  were  exclud¬ 
ed  from  the  analysis.  This  procedure  excluded  less  than  one  percent  of  the 
responses.  When  a  subject  responded  incorrectly  to  one  member  of  a  prime-tar¬ 
get  pair,  both  responses  were  excluded  fran  the  analyses.  Table  1  presents 
mean  response  times  and  errors  to  base  targets. 
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In  all  experiments,  error  rates  will  be  reported  in  the  appropriate 
tables.  Analyses  on  the  error  rates  will  be  reported  only  if  they  are  signif¬ 
icant. 


Table  1 

Mean  Reaction  Times  for  Words  and  Nonwords  for  the  Various  Prime-Target  Condi- 


tions  of  Experiment  1 

B1 

BB 

IB 

DB 

Inflections  only 

Words 

602 

516(.10) 

513(.12) 

Derivations  and  Inflections 

552 

499(.01 ) 

506(.04) 

525 (.07) 

Inflections  only 

Nonwords 

689 

654(.09) 

648(.18) 

Irregular  Inflections  only 

625 

551(  0) 

585(.06) 

Derivations  and  Inflections 

691 

615(.16) 

653(.13) 

675(.13) 

Note — Error  rates  are  in  parentheses. 


One-way  subject  and  item  analyses  were  performed  on  response  times  to 
base  words  (conditions  B1 ,  BB,  IB,  and  DB).  Separate  analyses  were  done  on 
the  32  items  appearing  only  in  inflected  and  base  forms  (Inflections  Only), 
and  on  the  16  items  appearing  in  derived,  inflected,  and  base  forms  (Deriva¬ 
tions  and  Inflections).  For  the  Inflections  Only  group  of  words,  the  effect 
of  priming  condition  was  significant  (subjects:  F(2,40)-17.90,  £<.001;  items: 
F(2,62)=20 .09 ,  £<.001 ).  Scheff^'s  tests  revealed  that  the  significant  main 
effect  was  due  to  the  B1  condition  differing  from  the  BB  and  IB  conditions 
(subjects;  F(2,40)-1 3.8,  £<.001;  items:  F(2 ,62)-15 .5,  £<.001 ).  The  differ¬ 
ence  between  the  BB  and  IB  conditions  was  not  significant. 

An  analogous  analysis  on  the  remaining  16  words  revealed  a  similar  out¬ 
come  for  inflections,  but  only  a  partial  repetition  effect  for  derivations. 
The  main  effect  of  priming  condition  was  significant  (subjects:  F(3,60 )-6. 1 7 , 
£=.001;  items:  F(3,45)=4.87 ,  £=.005).  Scheffd’s  tests  showed  that  this  ef¬ 
fect  was  again  due  to  the  B1  condition  differing  from  the  BB  and  IB  conditions 
(subjects:  F(3 ,60 )-3. 77 ,  £-.015;  items:  F(3,45)=3.62,  £-.02).  The  BB  and  IB 
conditions  did  not  differ  from  each  other.  In  the  DB  condition,  the  mean  re¬ 
sponse  time  did  not  differ  from  either  B1  response  time  or  BB  and  IB  response 
times.  These  results  are  very  similar  in  pattern  to  those  of  Experiments  1 
and  3  of  Stanners  et  al.  (1979). 

Similar  analyses  were  performed  on  nonwords.  Separate  analyses  were  done 
on  response  times  to  the  regular  nonwords  appearing  only  in  inflected  and  base 
forms,  the  16  regular  nonwords  appearing  as  derivations.  Inflections,  and 
bases,  and  the  10  irregular  nonwords.  The  effect  of  priming  condition  was 
marginally  significant  for  the  Nonword  Inflections  Only  group  in  the  subject 
analysis  only  (subjects:  F(2,40)-2.96,  £-.06;  items:  F(2 , 42)-1 .29 ,  £-.28). 
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Priming  was  significant  for  nonwords  in  the  Nonword  Derivations  and  Inflec¬ 
tions  in  the  subject  analysis,  F(3 ,60  )=5 .55,  and  marginally  signif¬ 

icant  in  the  item  analysis,  F( 3, '*57=2. 53.  £=.06.  Scheff^'s  tests  showed  the 
significance  of  the  former  effect  to  be  due  to  the  difference  between  the  B1 
and  BB  conditions,  F(3,60  )  =  4,95 ,  £=.004.  Irregular  Inflections  Only  reached 
significance  in  both  analyses  (subjects:  F(2,40)=7.24,  £=.002;  items: 

F(2,1 8)=4.25 ,  £=.03).  These  effects  were  also  attributable,  as  shown  by 
Scheffe's  tests,  to  the  difference  between  the  B1  and  BB  conditions  (subjects: 
F(2,40)=7.17,  £=.002;  items:  F(2 , 1 8 )=4 . 1 6,  £=.03). 

Discussion 

The  real-word  results  of  Experiment  1  replicate  the  results  of  Stanners 
et  al.  (1979).  Significant  repetition  priming  of  targets  occurred  for  both 
base  and  inflection  primes;  deprivations  also  primed  their  bases,  but  marginal¬ 
ly.  Stanners  et  al.  interpreted  the  corresponding  partial  repetition  effect 
they  found  to  signify  that  derivations  (and  irregular  inflections)  have  lexi¬ 
cal  entries  separate  from  their  base  forms. 

The  nonword  results  obtained  in  the  present  experiment  weaken  this  expla¬ 
nation.  Presumably,  nonword  repetition  effects,  particularly  those  among 
irregular  nonwords,  are  largely  episodic  rather  than  lexical  in  origin.  That 
is,  they  occur  because  subjects  remember  explicitly  having  seen  the  letter 
strings  previously  in  the  experiment  and,  perhaps,  having  made  a  particular 
response  to  them.  If  episodic  priming  affects  response  time  to  nonwords,  it 
may  also  contribute  to  repetition  priming  in  words.*  If  it  does,  thei  partial 
repetition  effects  may  reflect  decreased  episodic  priming;  the  less  the  target 
in  a  prime-target  pair  looks  like  the  prime,  the  less  it  reminds  tne  subject 
of  the  prime. 

Considerations  such  as  these  led  us  to  repeat  this  study  with  an  attempt 
to  reduce  the  effects  of  episodic  memory  on  subject  responses. 

Experiment  2 

In  an  effort  to  reduce  episodic  contributions  to  the  repetition  effect, 

we  extended  the  lag  between  primes  and  targets  of  a  base  morpheme  from  an 

average  of  9  items  in  Experiment  1  to  48  items  in  Experiments  2a  and  2b. 

In  addition,  we  instituted  a  control  for  unequal  practice  on  primes  and 

targets.  Necessarily,  the  prime  of  a  morpheme  appears  earlier  in  the  test  se¬ 
quence  than  its  target.  Consequently,  subjects  are  less  practiced  on  the 
average  when  they  respond  to  primes  than  when  they  respond  to  targets.  Possi¬ 
bly,  such  an  effect,  too,  contributes  to  priming. 

Any  asymmetrical  practice  of  this  sort  can  be  eliminated  by  a  procedure 
first  used  by  Forbach  et  al.  (1974)  but  not  used  subsequently  by  Stanners  et 
al.  (1979).  In  the  control  procedure,  the  test  sequence  of  words  is  parti¬ 
tioned  into  blocks.  In  the  first  block  of  test  trials,  only  fillers  and 
primes  of  morphemes  are  presented.  In  tne  second  block,  primes  from  the  first 
block  are  repeated  as  targets  interleaved  with  a  new  set  of  primes.  In  subse¬ 
quent  blocks  except  the  last,  new  primes  are  interleaved  with  repetitions  of 
primes  (now  targets)  from  the  previous  block.  In  the  final  block,  targets  are 
interleaved  with  fillers.  For  most  analyses,  data  from  the  first  and  last 
blocks  are  eliminated.  In  this  way,  analyses  are  restricted  to  comparisons  of 
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responses  to  primes  and  targets  made  at  comparable  levels  of  practice.  Across 
subjects,  words  are  counterbalanced  so  that  every  morpheme  occurs  equally  of¬ 
ten  in  each  block  as  a  prime  and  target. 

Two  experiments  were  run  using  these  changes  in  procedure.  In  Experiment 
2a,  primes  were  inflections  and  base  forms.  In  Experiment  2b,  they  were 
derivations  and  base  forms. 

Method 

Subjects.  Subjects  were  72  students  from  the  same  pool  as  in  Experiment 
1.  Thirty-six  subjects  participated  in  each  of  Experiments  2a  and  b.  This 
gave  three  replications  of  all  of  the  test-order  conditions  in  each  experi¬ 
ment. 

Stimulus  materials:  Experiment  2a.  Stimuli  were  ^<8  words  and  non¬ 

words  matched  in  length  to  the  words.  Each  word,  a  verb,  appeared  as  a  prime 
in  each  of  three  forms:  unlnflected  (base),  inflected  with  "s,"  and  inflected 
with  "ed."  An  Individual  subject  saw  each  morpheme  only  twice:  once  as  a 
prime  and  once  as  a  target.  In  every  instance,  inflected  forms  preserved  both 
the  spelling  and  the  pronunciation  of  the  base.  Targets  were  invariably  base 
forms.  Real- word  base  forms  appear  in  Appendix  B. 

Nonwords  were  2^  orthographic ally  regular  and  2H  Irregular  nonwords. 
Each  nonword  appeared  as  a  prime  in  three  forms:  uninflected,  inflected  with 
"s,"  and  inflected  with  "ed. "  As  for  the  words,  targets  of  nonwords  were 
invariably  "base"  forms. 

Experiment  2b.  Stimuli  were  M8  words  and  48  nonwords  matched  to  the 
words  in  length.  Each  word  and  nonword  appeared  as  a  prime  in  each  of  three 
forms:  unaffixed,  and  affixed  with  two  of  several  derivational  affixes  (two 
of  "ment,"  "less,"  "er,"  "ly,"  "ness,"  "able,"  "ful").  As  in  Experiment  2a, 
each  subject  saw  a  given  morpheme  only  twice.  All  nonwords  were  orthographi- 
cally  regular.  Real-word  bases  are  listed  in  Appendix  B. 

Test  orders.  The  test  sequences  consisted  of  one  practice  block  and  five 
test  blocks  each  48  trials  in  length.  For  purposes  of  counterbalancing,  the 
96  letter  strings  in  the  test  list  were  partitioned  into  four  sets.  Each  set 
included  12  words  and  12  nonwords  (four  bases,  eight  affixed  items).  A  Latin 
Square  was  used  to  order  the  sets  into  four  different  sequences.  For  example, 
the  Latin  Square  ordering  1 -2-4-3  created  a  test  sequence  in  which  items  in 
the  first  set  constituted  the  primes  of  the  first  block  of  the  test  sequence 
and  the  target  repetitions  of  the  second  block.  Primes  in  block  one  were  in¬ 
terleaved  with  filler  items.  Items  in  set  2  provided  the  primes  in  block  2  of 
the  test  sequence  and  the  target  repetitions  in  block  3.  Items  in  set  4  pro¬ 
vided  the  primes  in  the  third  block  and  the  target  repetitions  in  the  fourth 
block.  Finally,  items  in  set  3  provided  the  primes  of  block  4  and  the  target 
repetitions  in  the  final  block.  In  the  last  block,  set  3  items  were  inter¬ 
leaved  with  fillers.  The  ordering  procedure  created  a  lag  of  48  items  between 
the  prime  and  target  of  a  morpheme. 

The  four  test  orders,  each  based  on  one  row  of  the  Latin  Square,  appeared 
in  three  versions.  The  versions  were  identical  except  for  the  affixes  on 
their  first  occurring  morphemes.  For  example,  matched  to  a  test  order  in 
which  say,  "pushes"  appeared  as  a  prime  in  block  2,  were  two  test  orders  in 
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which  "push"  and  "pushed"  respectively  appeared  as  prime  in  block  2.  In  each 
experiment,  one  third  of  the  priming  items  were  bases;  one  third  were  words 
affixed  with  "s"  in  Experiment  2a  and  one  of  the  derivational  affixes  in 
Experiment  2b;  the  remaining  third  Included  words  affixed  with  "ed"  in  Experi¬ 
ment  2a  and  words  with  other  derivational  affixes  in  Experiment  2b.  This  gave 
12  different  test  orders  for  each  of  Experiments  2a  and  b. 

Design.  Subjects  experienced  all  levels  of  the  independent  variable, 
priming  condition.  The  primary  dependent  measure  was  response  time. 

Procedure.  The  procedure  was  identical  to  that  in  Experiment  1 . 


Results* 


Response  times  and  errors  were  analyzed  as  in  Experiment  1.  Table  2 
presents  response  times  and  errors  to  base  words  and  nonwords  in  blocks  2-4 
from  Experiments  2a  and  2b. 

Response  times  to  base  words  in  Experiment  2a  differ  as  a  function  of 
their  priming  condition  (subject  analysis:  F(2,70)  =  54.73,  £<.001;  item 

analysis:  F(2,94)  =  46.59,  £<.001 ).  Scheff^'s  tests  reveal  no  significant 
difference  on  the  subjects  analysis  in  response  times  to  BB  and  IB  words, 
F(2,70)  =  2.56,  £  =  .08).  However,  the  difference  does  reach  significance  on 
the  item  analysis,  F(2,94)  =  3.32,  £  =  .04).  The  78  ms  difference  between 

conditions  B1  and  IB  is  significant  (subjects:  F(2,70)  =  29.45,  £  <  .001; 

items:  F(2,94)  =  22.9,  £  <  .001).  Statistically,  then,  the  repetition  ef¬ 

fects  of  inflected  words  on  bases  are  full. 


Table  2 

Response  Times  to  Words  and  Nonwords  in  Experiments  2A  (Left)  and  2B  (Right) 


B1 

BB 

IB 

B1 

BB 

DB 

Words 

61 1 

510(.07) 

533(.07) 

585 

543(.05) 

538(.03) 

Nonwords 

643 

627(.14) 

645 (.10) 

715 

717(.17) 

730(.16) 

Note — Error  rates  are  in  parentheses. 


Analysis  of  the  response  times  to  base  and  derived  forms  of  Experiment  2b 
gives  a  similar  picture  (subjects  analysis:  F(2,70)  -  9.03,  £  <  .001;  items 
analysis:  F(2,94)  -  8.24,  £  <  .001). 

Table  2  also  provides  the  comparable  findings  on  nonwords.  Repetition 
priming  among  nonwords  was  statistically  absent  in  both  studies  (Experiment 
2a:  Subjects  analysis:  F(2,70)  =  2.02,  £  -  .14;  item  analysis:  F(2,94) 
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1.22,  In  Experiment  2b  both  F  values  are  less  than  1.)  Thus,  there  Is  no  ap¬ 
parent  episodic  repetition  priming  on  nonwords  in  these  experiments  in  which  a 
il8-item  lag  is  used  and  in  which  the  control  procedure  for  practice  is  imple¬ 
mented.’  (In  all  subsequent  experiments,  nonword  effects  will  be  reported  in 
tables,  but  not  described  in  the  text  unless  they  involve  statistically  sig¬ 
nificant  effects.) 

Discussion 

Having  significantly  reduced  evidence  of  episodic  priming  in  nonword 
stimuli,  we  obtain  a  somewhat  different  picture  of  repetition  priming  in  de¬ 
rived  and  inflected  words  than  we  obtained  in  Experiment  1  and  than  Stanners 
et  al.  (1979)  report.  In  particular,  we  find  that  repetition  priming  of  a 
base  form  by  a  derivational  relative  is  as  strong  as  priming  by  an  inflection¬ 
al  relative.  Moreover,  the  priming  is  statistically  and,  in  Experiment  2b, 
numerically,  full. 

These  findings  invite  one  of  two  salient  Interpretations.  One, 
compatible  with  Butterworth's  assessment  of  the  lexicon  (1983)  is  that  repeti¬ 
tion  priming  occurs  among  separate  lexical  entries  in  the  lexicon;  it  is  not  a 
consequence  (except  in  the  case  of  exact  repetitions)  of  repeated  access  to  a 
common  lexical  entry.  A  second  is  that  it  does  reflect  repeated  access,  but  a 
lexical  entry  is  more  inclusive  than  had  previously  been  suggested  by  repeti¬ 
tion-priming  findings.  As  we  will  suggest  in  the  General  Discussion,  the 
substantive  differences  between  these  views  are  smaller,  in  light  of  con¬ 
straints  on  their  realizations  imposed  by  our  findings,  than  the  statements  of 
them  suggest. 

In  the  next  experiment,  we  further  examine  the  kinds  of 
morphologically-related  words  that  are  strongly  associated,  or  that  share  a 
lexical  entry.  We  do  so  by  examining  priming  of  an  unaffixed  form  by  affixed 
morphological  relatives  that  do  not  necessarily  preserve  the  spelling  or 
pronunciation  of  the  stem  morpheme  in  the  unaffixed  form.  In  addition,  we  ex¬ 
amine  two  types  of  derivatlonally  affixed  words. 

Possibly,  the  derived  words  we  used  in  Experiment  2  were  special  and  gave 
rise  to  unrepresentatively  strong  priming.  Chomsky  and  Halle  (1968)  identify 
two  types  of  suffix  in  English.  One,  neutral  affixes.  Includes  inflections 
and  some  derivations;  these  affixes  do  not  affect  pronunciation  of  the  stem 
morphemes  to  which  they  are  attached.  In  contrast,  nonneutral  (derivational) 
affixes  do  affect  the  stem  morpheme's  pronunciation  (e.g.,  "sign "-"signal") . 
In  Chomsky  and  Halle's  theory,  neutral  affixes  are  separated  from  the  stem 
morpheme  by  a  word  boundary,  which  prevents  application  of  phonological  rules 
over  extents  spanning  stem  and  affix.  Nonneutral  affixes  are  separated  from 
the  stem  by  lesser,  morpheme  boundaries  that  do  not  prcrfiibit  application  of 
phonological  rules  over  the  whole  domain  of  stem  plus  affix.  In  our  Experi¬ 
ment  2b,  affixes  were  neutral  derivational  affixes.  Perhaps  it  is  not 
surprising  that  neutrally-affixed  derivations  were  as  effective  primes  as 
inflected  words. 

In  Experiment  3,  we  compare  priming  of  unaffixed  words  by  morphological 
relatives  that  do  or  do  not  share  pronunciation  or  spelling  of  the  stem 
morpheme  with  the  unaffixed  form.  This  allows  us  to  compare  priming  by  irreg¬ 
ular  inflected  words  and  regular  morphological  relatives  (cf.  Kempley  &  Mor¬ 
ton,  1982).  In  addition,  in  a  post  hoc  analysis,  we  look  specifically  at 
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neutrally  and  nonneutrally  affixed  derivations  and  compare  their  priming 
effectiveness. 

Experiment  3 

In  the  present  experiment,  we  examine  priming  by  morphologically-related 
forms  in  which  either  the  pronunciation  or  the  spelling  and  pronunciation  of 
the  common  morpheme  is  not  shared  by  prime  and  target  forms.  The  experiment 
had  two  purposes  in  addition  to  the  one  just  described  of  examining  priming  by 
derived  forms  with  nonneutral  affixes.  A  related  purpose  was  to  reexamine  ef¬ 
fects  of  decreases  in  formal  overlap  (and  hence,  for  English,  in  regularity) 
between  morphologically-related  primes  and  targets  on  repetition  priming. 
Stanners  et  al.  had  found  that  priming  of  a  base  by  an  affixed  form  decreases 
as  formal  overlap  between  the  affixed  and  unaffixed  words  decreases.  Kempley 
and  Morton  (1982)  found  no  priming  between  irregular  and  regular  forms  when 
the  words  were  presented  auditorily.  The  present  study  was  designed  to  reex¬ 
amine  these  priming  effects  under  the  conditions  we  have  developed  which  re¬ 
duce  episodic  priming  effects.  Possibly,  in  the  earlier  studies,  the  differ¬ 
ences  in  priming  across  conditions  was  episodic  in  origin;  targets  following 
formally  identical  or  similar  primes  cued  memory  for  the  primes  while  dissimi¬ 
lar  targets  did  not.  A  final  purpose  of  the  experiment  was  to  separate  ef¬ 
fects  of  orthographic  and  phonological  overlap  between  prime  and  target  on  the 
magnitude  of  priming. 

Method. 

Subjects.  Thirty-six  students  participated  in  Experiment  3a  and  24  dif¬ 
ferent  students  in  Experiment  3b.  All  came  fran  the  same  subject  pool  used 
previously. 

Stimulus  materials.  Two  sets  of  twenty-four  word  triads  were  devised. 
In  one  set,  the  "Sound  Only"  set,  each  triad  included  one  base  form  and  two 
affixed  forms;  one  affixed  form  preserved  the  spelling  and  pronunciation  of 
the  unaffixed  form  (henceforth  the  "NC"  or  "no  change"  form)  and  one  preserved 
only  the  spelling  (henceforth  the  "C"  or  "changed"  form).  A  sample  triad  is 
"heal,"  '  lealer,"  and  "health."  (In  six  items,  a  silent  "e"  in  the  base 
morpheme  was  deleted  in  an  affixed  form.)  The  second  set,  the  "Sound  and  Spel¬ 
ling"  set,  also  consisted  of  triads  including  an  unaffixed  form  and  two  af¬ 
fixed  words.  In  this  set,  one  affixed  word  shared  both  spelling  and 

pronunciation  of  the  base  morpheme  with  the  unaffixed  word  (the  "NC"  form  for 
this  set)  while  the  other  affixed  word  shared  neither  spelling  nor  pronuncia¬ 
tion  with  the  unaffixed  word  (the  "C"  form).  An  example  is  "clear,"  "clear¬ 
ly,"  "clarify."  In  both  sets,  words  in  the  third  category  were,  with  few 
exceptions,  irregular  forms. 

Because  Experiment  2  showed  no  difference  in  priming  by  inflected  and  de¬ 
rived  forms  that  shared  spelling  and  pronunciation  with  the  unaffixed  form,  we 
felt  Justified  in  mixing  the  two  types  of  affixed  forms  in  our  new  lists. 
However,  approximately  equal  numbers  of  derived  forms  and  equal  numbers  of 
inflected  forms  occurred  in  the  Sound  Only  and  Sound  and  Spelling  triads,  and 
there  were  sufficient  numbers  of  pairs  of  neutrally-affixed  forms  and 
nonneutrally-aff ixed  forms  that  they  could  be  examined  separately  in  a 
post-hoc  analysis. 


Fowler  et  al.:  Morphological  Relations  in  the  Lexicon 


Phonological  overlap  between  unaffixed  and  affixed  words  was  matched 
across  Sound  Only  and  Sound  and  Spelling  lists  by  counting  each  vowel,  conso¬ 
nant,  or  stress  change  as  one  change  and  matching  number  of  changes  across  the 
two  lists.  In  addition,  an  effort  was  made  to  match  type  of  change  (vowel, 
consonant  or  stress)  as  closely  as  possible.  Our  final  experiment  (Experiment 
i)b)  is  an  auditory  lexical  decision  experiment  using  these  materials,  which 
shows  that  our  matching  efforts  were  successful. 

Unaffixed  words  in  the  Sound  Only  and  Sound  and  Spelling  lists  were 
matched  in  length  and  frequency  (Ku?era  &  Francis,  1967).  Similarly,  the  two 
different  types  of  affixed  forms  were  matched  in  length  and  frequency  within 
and  across  the  two  lists.  Appendix  C  lists  the  word  triads  in  the  two  stimu¬ 
lus  sets. 

We  created  triads  of  nonwords  frcoi  triads  of  words  that  might  have  ap¬ 
peared  as  word  stimuli  in  the  experiment.  They  were  made  into  nonwords  by 
changing  one  or  two  letters,  while  preserving  their  orthographic  regularity. 
Forty-eight  nonword  triads  were  created  In  this  way. 

From  the  sets  of  words  and  nonwords,  three  basic  stimulus  lists  were 
created.  Each  base  morpheme  appeared  twice  in  each  list,  once  as  a  prime  and 
once  as  a  target.  The  lists  differed  in  respect  to  which  version  of  the 
morpheme  (unaffixed,  affixed  with  no  sound  or  spelling  change,  affixed  with  a 
change)  appeared  as  the  prime.  The  target  was  always  the  unaffixed  form.  In 
each  list  there  were  sixteen  of  each  type  of  prime.  Half  of  each  set  of  16 
items  was  from  each  of  the  two  sets  of  stimulus  words.  There  were  sixteen  of 
each  type  of  nonword  prime. 

The  stimulus  lists  were  organized  exactly  as  in  Experiments  2a  and  2b. 
As  in  those  experiments,  four  versions  of  each  basic  list  were  created  so 
that,  across  subjects,  each  prime  occurred  equally  often  in  the  first  four 
blocks  of  stimuli.  Each  stimulus  list  was  preceded  by  a  practice  list  of  Zk 
words  and  2*1  nonwords  randomly  ordered. 

Procedure.  The  experiment  was  run  twice.  The  second  experiment  (3b)  was 
Identical  to  the  first  (3a)  except  that  the  stimuli  were  presented  under 
degraded  viewing  conditions  (by  turning  down  the  contrast  on  the  CRT  screen) 
in  an  effort  to  slow  response  times  and  thereby,  perhaps,  magnify  the  very 
small  departures  from  full  repetition  priming  we  observed  in  Experiment  3a. 
This  manipulation  had  no  effect  on  the  pattern  of  reaction  times  we  observed; 
therefore,  we  present  both  outcomes  together. 

The  procedure  and  instructions  to  the  subjects  were  identical  to  those 
used  in  Experiment  2. 

Design.  Subjects  participated  at  all  levels  of  the  two  Independent  vari¬ 
ables,  Stimulus  Set  (Sound  Only,  Sound  and  Spelling)  and  Priming  Condition 
(B1,  BB,  NCB  ["no-change/base" — that  is,  a  base  primed  by  an  affixed  word  in 
which  the  sound  and  spelling  of  the  unaffixed  base  morpheme  is  preserved] ,  CB 
[ "changed-form/base" — that  is  a  base  primed  by  an  affixed  word  in  which  the 
base  pronunciation  or  spelling  and  pronunciation  is  changed  from  the  unafflxed 
version]).  The  major  dependent  measure  is  response  time. 
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Results* 


Extreme  response  times  were  deleted  fron  the  data  as  described  for 
Experiment  1.  Results  for  word  and  nonword  stimuli  are  presented  in  Table  3 
collapsed  over  the  factor  Stimulus  Set.  Separate  two-way  repeated  measures 
analyses  of  variance  with  factors  Priming  Condition  (B1,  BB,  NCB,  CB)  and 
Stimulus  Set  (Sound  Only,  Sound  and  Spelling)  were  performed  on  the  outcomes 
of  the  two  experiments  using  subjects  as  a  random  effect.  Separate  items  ana¬ 
lyses  were  also  run  with  one  within-groups  factor  (Priming  Condition)  and  one 
between-groups  factor  (Stimulus  Set).  In  Experiment  3a,  the  effect  of  Priming 
Condition  reached  significance  in  both  subjects  and  items  analyses  (subjects: 
F(3,105)  =  20.82,  £  <  .001;  items:  £(3,138)  =  12.81,  £  <  .001).  The  effect 

of  Stimulus  Set  was  significant  in  the  subjects  analysis,  with  response  times 
faster  in  the  Sound  Only  condition,  but  was  nonsignificant  in  the  items  analy¬ 
sis  (subjects:  £(1,35  )  =  12.59  ,  £  <  .001;  items:  £(1,H6)  =  2.44,  £  =  .12). 

The  interaction  did  not  approach  significance  in  either  analysis  (both  £s  < 
1  ). 


Table  3 

Response  Times  in  Experiments  3A  and  3B 


B1 

BB 

NCB 

CB 

Experiment  3A 

Words 

623 

558( .05) 

575( .09) 

584( .06) 

Experiment  3B 

673 

590(.05) 

612(.06) 

621  (.09  ) 

Neutral  and  nonneutraJ 
deri vations 

669 

579 

586 

601 

Experiment  3A 

Nonwords 

760 

748(.14) 

758(.  15) 

746(.13) 

Experiment  3B 

788 

777(.18) 

779  (.17  ) 

783( .18) 

Note — Error  rates  are  in  parentheses. 


The  effect  of  prime  type  is  due  primarily  to  the  difference  between  the 
response  to  an  unaffixed  prime  and  its  occurrence  as  a  target  following  any  of 
the  three  primes  (subjects:  £(3.105)  =  17.75,  £  <  .001;  items:  £(3.138) 
10.80,  £  <  .001).  Among  the  prime  conditions,  the  difference  in  the  effect  of 
an  unaffixed  prime  (BB)  as  compared  to  the  effects  of  the  other  primes  (NCB, 
CB)  reaches  significance  in  the  subjects  analysis,  but  not  in  the  items  analy¬ 
sis  (subjects:  £(3.105)  =  2.80,  £  =  .04;  items:  £(3.138)  =  1.73.  £  =  .16). 

The  additional  effect  of  sharing  or  not  sharing  spelling  or  pronunciation  with 
the  base  (that  is,  the  difference  between  575  and  584)  is  not  significant. 

We  performed  additional  analyses  on  the  data  of  Experiment  3a  having  ■ 
moved  the  six  items  fron  the  Sound  Only  condition  in  which  presence  an  1  r 
sence  respectively  of  a  silent  "e"  distinguished  the  base  and  affixed  '  ■t 
Removing  these  items  had  no  effect  on  the  outcone  of  the  experiment. 
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feet  of  prime  condition  remained  highly  significant;  neither  the  effect  of 
Stimulus  Set  nor  the  interaction  was  significant. 

A  major  finding  of  Experiment  3a  is  that  priming  by  affixed  forms  is 
nearly  full.  The  priming  by  NCB  forms  replicates  the  outcome  of  Experiments 
2a  and  2b.  Overall  in  the  present  experiment,  1?  ms  less  priming  occurs  when 
the  prime  differs  from  the  target  in  being  affixed,  but  shares  sound  and  spel¬ 
ling  with  the  prime  as  compared  to  priming  by  the  unaffixed  form  itself.  CB 
forms  reduce  priming  by  an  additional  9  ms. 

We  ran  Experiment  3b  to  ask  whether,  by  slowing  response  times,  we  could 
magnify  the  small  differences  we  observed  between  the  BB,  NCB,  and  CB  condi¬ 
tions.  Our  manipulation,  reducing  the  contrast  on  the  CRT  screen,  slowed  re¬ 
sponse  time  overall  by  42  ms.  The  slowing  was  significant  in  an  items  analy¬ 
sis  (F(1,92)  =  8.23,  £  =  .006),  but  not  in  the  subjects  analysis  (FI, 58) 
2.17,  £  =  .14).  There  were  no  interactions  involving  the  factor  Experiment  in 
the  overall  analysis  and,  in  the  analysis  of  Experiment  3b,  there  was  no  in¬ 
crease  in  the  magnitude  of  the  separation  of  BB  and  NCB  times  on  the  one  hand 
or  NCB  and  CB  times  on  the  other.  Statistical  analysis  of  the  response  times 
in  Experiment  3b  provided  an  identical  pattern  of  significant  effects  to  the 
pattern  observed  in  Experiment  3a. 

In  Experiment  3b,  error  proportions  were  .04,  .06,  and  .09  on  BB,  NCB, 
and  CB  items,  respectively.  This  was  significant  (subjects:  F(2,46)  =  6.07, 
£  =  .005:  items:  F(2,92)  =•  6.96,  £  «  .002). 

A  final  analysis  examined  neutrally-  and  nonneutrally-aff ixed  derivations 
separately  from  the  irregular  inflected  forms  that  were  Included  in  the  stimu¬ 
lus  sets.  The  purpose  of  the  analysis  was  to  answer  the  question  raised  by 
the  finding  in  Experiment  2  that  derivations  as  well  as  inflections  fully 
primed  their  base  forms.  The  question  raised  was  whether  this  finding  is 
limited  to  neutrally-affixed  derivations,  which  preserve  the  pronunciation  of 
the  base  morpheme. 

Eight  Sound  Only  and  ten  Sound  and  Spelling  triads  permitted  a  comparison 
of  priming  by  NC  neutrally-affixed  derivations  and  by  C  nonneutrally-affixed 
derivations.  These  18  items  were  subjected  to  a  one-way  analysis  of  variance 
with  the  single  factor  Prime  Condition  (B1 ,  BB,  NCB,  and  CB).  The  analysis 
collapsed  over  the  nonsignificant  factor.  Stimulus  Set,  and  across  Experiments 
3a  and  b.  Only  the  items  analysis  was  performed.  As  Table  3  reveals,  the 
pattern  of  means  mirrors  very  closely  that  of  the  overall  analysis.  The  pat¬ 
tern  of  significant  and  nonsignificant  differences  is  also  the  same  as  in  the 
overall  analysis.  Thus,  the  overall  effect  of  Priming  Condition  is  signif¬ 
icant,  F(3,51)  =  10.19,  £  <  .001).  Moreover,  the  three  affixed  primes  dif¬ 

fered  from  the  B1  condition  both  separately  and  as  a  group  (overall  F(3,51) 
9.70,  £  <  .001);  they  did  not  differ  from  each  other  (all  Fs  leas  than  one). 

This  implies  no  substantial  difference  between  neutrally-  and  nonneutrally-af¬ 
fixed  derivations  in  their  ability  to  prime  an  unafflxed  morphological  rela¬ 
tive. 

Discussion 

The  major  outcome  of  the  present  study  is  that  there  is  essentially  no 
loss  in  repetition  priming  when  the  orthographic  or  phonological  representa¬ 
tions  of  affixed  primes  and  morphologically-related  targets  do  not  fully  over- 
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lap.  Because  we  found  no  effect  on  priming  of  differences  in  form  between 
prime  and  target,  we  could  not  separate  effects  of  spelling  and  sound  differ¬ 
ences  as  intended.  Experiment  4  will  address  that  issue  once  again.  We  did 
find  a  suggestion,  significant  in  the  subjects  analysis  only,  of  a  small  loss 
in  priming  when  an  affixed  prime  precedes  a  base  target  as  compared  to  exact 
repetition  priming,  but  there  is  no  significant  additional  loss  when  the  af¬ 
fixed  form  differs  in  sound  or  sound  and  spelling  from  the  base.  This  shift, 
too,  was  a  shift  from  regularly-affixed  words  to  largely  irregular  forms. 
Thus,  we  found  no  loss  in  priming  between  regularly-affixed  forms  and  their 
irregular  morphological  relatives."  Accordingly,  we  conclude  that,  however 
repetition  priming  effects  are  explained — as  repeated  access  to  a  ccmmon  lexi¬ 
cal  entry  or  as  priming  among  strongly  associated  but  distinct  entries  or  in 
some  other  way — the  relationships  of  irregular  and  regular,  derived,  inflected 
and  unaffixed  forms  must  be  explained  in  fundamentally  the  same  way. 

Experiment  4 

We  designed  the  final  experiment  with  two  main  purposes  in  mind.  One  was 
to  compare  priming  in  the  auditory  and  visual  modalities.  In  Morton's  logogen 
model,  each  logogen  has  paired  auditory  and  visual  inputs  (Morton,  1981). 
That  is,  a  word  has  a  logogen  (in  the  model's  most  recent  version,  an  "output 
logogen,"  but  not  an  "input  logogen")  In  common  whether  it  is  auditorily  or 
visually  presented.  This  idea  is  supported  by  findings  of  some  cross-modal 
repetition  priming  (Klrsner,  Milech,  &  Standee,  1983).  However,  whereas  in 
Experiments  3a  and  3t>  we  found  strong  priming  of  visually-presented  unaffixed 
words  by  irregular  morphological  relatives,  Kempley  and  Morton  (1982)  found  no 
priming  between  auditorily-presented  unaffixed  words  and  irregular,  inflected 
morphological  relatives.  Kempley  and  Morton  used  different  stimuli  than  we 
did  and  a  different  paradigm  with  longer  lags  between  prime  and  target. 
Consequently  a  variety  of  reasons  for  this  difference  are  tenable.  In  the 
present  study,  we  use  common  word  sets  and  a  common  paradigm  to  compare  prim¬ 
ing  in  the  two  modalities  directly. 

Our  second  purpose  was  to  examine  priming  when  affixed  words  appear  as 
targets  in  the  repetition-priming  paradigm.  This  allows  us  to  address  two 
questions,  one  theoretical  and  one  methodological.  The  first  question 
concerns  the  organization  of  morphological  relatives  in  the  lexicon.  One 
possibility  is  that  all  morphologically-related  words  are  uniformly  related  to 
each  other  in  the  lexicon.  Other  possibilities  can  be  imagined  as  well,  how¬ 
ever.  One  may  be  developed  by  analogy  from  a  theory  of  lexical  organization 
in  Serbo-Croatian,  a  highly  inflected  language  (Lukatela,  Gligorijevid,  Kostid 
&  Turvey,  1980).  In  that  so-called  "satellite-entries"  theory,  a  particular 
inflected  form,  the  nominative,  rather  than  the  root  morpheme,  is  proposed  as 
the  hub  of  an  array  of  associated  morphologically-related  words  (satellites). 
Inflected  words  other  than  the  nominative  are  associated  to  the  nominative 
form  but  not  (or  less  strongly)  to  each  other.  In  this  organization,  the 
nominative  should  prime  and  be  primed  by  other  morphologically-related  affixed 
forms  more  effectively  than  the  affixed  forms  prime  each  other.  In  English, 
the  unaffixed  base  form  is  the  most  likely  counterpart  to  the  nominative  in 
Serbo-Croatian.  If  English  has  an  analogous  organization,  then  the  unaffixed 
word  should  prime  and  be  primed  by  affixed  forms  more  effectively  than  affixed 
forms  prime  each  other.  Our  experiment  is  designed  to  discriminate  between 
these  views  b;  examining  priming  of  affixed  words  by  unaffixed  and  other  af- 
.  ;:ed  morphological  relatives. 
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The  methodological  question  concerns  the  possibility  that  the  patterns  of 
priming  that  we  obtain  using  our  paradigm  are  largely  products  of  a  para¬ 
digm-specific  strategy  by  which  subjects  predict  the  target  given  the  prime. 
Forster  and  Davis  (198^)  and  Oliphant  (1983)  have  shown  that  repetition  prim¬ 
ing  is  severely  diminished  (and  absent  in  Oliphant's  study)  if  subjects  are 
unaware  that  words  are  repeated  in  the  experiment.  In  the  work  of  Forster  and 
Davis,  some  subjects  are  made  unaware  of  the  repetitions  because  the  prime  is 
masked.  Repetition  priming  is  small,  short-lived  and,  in  at  least  one  respect 
(absence  or  presence  of  a  frequency-by-priming  interaction),  qualitatively 
different  in  pattern  from  priming  observed  when  subjects  are  aware  of  the 
prime. 

In  other  research  (Napps,  in  preparation),  one  of  us  has  also  found  a  re¬ 
duction  in  the  magnitude  of  repetition  priming  when  the  proportion  of  targets 
in  the  experiment  is  only  .06  of  all  stimulus  items.  Nonetheless,  even  under 
these  conditions,  significant  priming  is  found  using  the  Sound  and  Spelling 
stimuli  of  Experiment  3  out  to  the  longest  lag  examined  in  that  experiment  (10 
intervening  items).  Napps'  findings  in  this  study  and  in  others  using  low 
proportions  of  repeated  items  suggest  that  the  priming  we  obtain  with  a  high 
proportion  of  related  items  does  not  create  the  appearance  of  relations  among 
morphological  relatives  that  are  unrelated  in  the  lexicon.  Rather,  they  en¬ 
hance  effects  of  existing  relations. 

To  further  address  the  question  whether  our  priming  reflects  lexical 
organization,  or  instead  reflects  predictability  of  the  target  given  the 
prime,  we  designed  Experiment  4  to  reduce  the  subjects'  ability  to  make  useful 
predictions.  In  Experiments  1-3,  targets  were  always  unaffixed  words. 
Accordingly,  given  a  prime,  subjects  could  guess  the  identity  of  the  target 
word  that  would  appear  some  50  items  later  in  the  next  block  of  stimuli.  In 
Experiment  4,  targets  were  less  predictable  than  in  earlier  experiments  be¬ 
cause  they  were  one  of  several  possible  affixed  morphological  relatives  of 
primes. 

As  a  second  assessment  of  the  role  of  prediction,  we  provide  a  separate 
analysis  of  repetition  priming  effects  on  the  very  first  block  of  the  experi¬ 
ment  in  which  repetitions  occur,  and  thus  before  subjects  have  an  opportunity 
to  develop  a  strategy  of  guessing  targets  frcoi  primes.* 

Methods 


Subjects.  Subjects  were  72  students  from  the  same  subject  pool  used 
previously.  Thirty-six  students  participated  in  each  of  Experiments  Ha  and 
4b.  All  subjects  had  normal  hearing  in  Experiment  4b. 

Stimulus  materials.  The  materials  were  those  used  in  Experiment  3,  with 
one  exception.  In  the  test  lists,  the  NC  affixed  form  replaced  the  unaffixed 
form  in  all  positions  in  which  it  occurred  as  a  target.  This  yielded  priming 
conditions  NCI  (first  occurring  affixed  item),  NCNC  (affixed  word  primed  by 
itself),  BNC  (affixed  item  primed  by  the  unaffixed  form),  CNC  (affixed  item 
primed  by  an  affixed  morphological  item  that  does  not  preserve  the  pronuncia¬ 
tion  or  the  spelling  and  pronunciation  of  the  unafflxed  morpheme). 

For  Experiment  4b,  stimulus  items  were  recorded  onto  audio  tape  by  a  fe¬ 
male  native  speaker  of  English  (CAF).  These  productions  were  sampled  by 
computer  at  10  kHz.  This  enabled  the  same  token  of  each  NC  prime  or  target 
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item  to  be  used  in  all  conditions.  The  test  orders  were  recorded  on  one  chan¬ 
nel  of  an  audio  tape.  Tone  bursts  were  recorded  on  the  second  channel  of  the 
tape  for  purposes  of  collecting  response  times.  The  tone  bursts  were  syn¬ 
chronized  to  the  onsets  of  acoustic  energy  of  each  stimulus  item  in  the  test 
order.  Therefore  response  times  include  word  duration  (or  as  much  of  the  word 
as  occurred  before  the  subject  made  his  or  her  button-press  response).  That 
stimulus  words  have  different  durations  is  unimportant  in  the  repetition  prim¬ 
ing  procedure  because  critical  comparisons  involve  response  times  made  to  the 
same  items  across  different  priming  conditions.  Stimulus  items  were  recorded 
onto  audio  tape  with  a  three-second  inter-stimulus  interval. 

Only  three  test  lists  were  used  in  Experiment  4b  as  compared  to  the  12 
used  in  Experiments  2,  3  and  4a.  The  three  lists  had  the  same  order  of  stimu¬ 
lus  items  but  differed  in  respect  to  which  of  the  three  prime  types  occurred 
with  each  target  item.  It  was  infeasible  to  Include  the  additional  test  ord¬ 
ers  needed  to  counterbalance  the  block  in  which  each  stimulus  item  appeared  as 
prime  and  target. 

Procedure.  The  procedure  for  Experiment  4a  was  identical  to  that  for  the 
previous  experiments. 

In  Experiment  4b,  subjects  listened  over  headphones  to  binaural  presenta¬ 
tions  of  the  test  list.  A  New  England  Digital  Able  40  minicomputer  monitored 
the  second  tape  channel  for  the  tone  bursts  and  started  a  millisecond  clock 
when  one  was  detected.  The  clock  was  read  and  a  response  and  response  time 
were  stored  when  subjects  pressed  the  labeled  "word"  or  "nonword"  button  on 
the  computer-terminal  keyboard.  If  a  response  was  not  made  within  2.5  seconds 
following  stimulus  presentation,  the  computer  stopped  the  tape  recorder  and 
printed,  "Please  make  a  response"  on  a  CRT  screen  facing  the  subject.  Receipt 
of  the  button-press  response  restarted  the  tape  recorder.  The  tape  recorder 
was  also  stopped  between  blocks  as  subjects  received  feedback  on  their  mean 
response  times  and  accuracies  for  the  block.  Subjects  initiated  successive 
blocks  by  hitting  a  key  on  the  terminal  keyboard. 

Design.  In  both  experiments,  subjects  participated  at  all  levels  of  the 
independent  variables.  Priming  Condition  (NCI,  NCNC,  BNC,  CNC)  and  Stimulus 
Set  (Sound  Only,  Sound  and  Spelling).  The  major  dependent  measure  was  re¬ 
sponse  time. 

Results* 

Errors  and  extreme  response  times  were  eliminated  from  the  analysis  as  in 
the  earlier  experiments.  Table  4  provides  the  mean  response  times  and  errors 
for  Experiments  4a  and  4b, 

Separate  two-way  repeated-measures  analyses  of  variance  were  performed  on 
the  response  times  of  Experiment  4a  using  subjects  and  items  as  random  fac¬ 
tors.  The  independent  variables  were  Prime  Condition  (NCI,  NCNC,  BNC,  CNC) 
and  Stimulus  Set  (Sound  Only,  Sound  and  Spelling).  In  both  analyses,  the  ef¬ 
fects  of  Prime  Condition  (subjects;  F(3,105)  -  14.79,  £  <  .001;  items: 

F(3,138)  *  16.46,  £  <  .001)  and  the  interaction  (subjects:  F(3,105)  -  4.29,  £ 
-  .007;  items:  F(3,138)  «  3.00,  £  -  .03)  were  significant.  Scheffd's  tests 
performed  on  the  two  stimulus  sets  separately  show  that,  for  the  Sound  Only 
condition,  all  three  primed  conditions  differ  from  the  unprimed  condition  and 
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Table  ^ 


Mean  Response  Times 

in  Experiments  HA 

(Visual) 

and  HB  (Auditory) 

NCI 

NCNC 

BNC 

CNC 

Words 

Experiment  HA 

Sound  only 

633 

571 (.05) 

585(.0H) 

580(.07) 

Sound  and  spelling 

687 

57H(.07) 

591  (.07) 

6H6(.08) 

Experiment  HB 

Sound  only 

796 

73'*(.09) 

770  (.07) 

780(.11 ) 

Sound  and  spelling 

807 

73'*  (.05) 

762(.06) 

772(.10) 

Nonwords 

Experiment  HA 

761 

757(.13) 

771 (.1H) 

768 (.11) 

Experiment  HB 

861 

868(.15) 

862(.16) 

863(.18) 

Note — Error  rates  are  in  parentheses. 


do  not  differ  from  each  other.  For  the  Sound  and  Spelling  condition,  however, 
whereas  the  B  and  NC  primes  were  effective,  the  C  prime  did  not  lead  to  re¬ 
sponse  times  significantly  faster  than  the  no-prime  condition. 

With  two  exceptions,  the  outcome  of  Experraent  *»b  was  very  similar  to  that 
of  Experiment  i<a.  In  the  analysis  of  response  times  to  auditorily  presented 
targets,  only  the  effect  of  Prime  Condition  was  significant  (subjects: 
F(3,105)  -  S'*. 90,  £  <  .001;  items:  F(3,138)  -  13. '*6,  £  <  .001).  Neither  the 
main  effect  of  Stimulus  Set  nor  the  interaction  approached  significance.  The 
nonsignificant  interaction  contrasts  with  the  outcome  of  Experiment  *la.  The 
absence  of  an  interaction  between  Stimulus  Set  and  Priming  Condition  with  au¬ 
ditory  presentation  la  not  surprising  in  view  of  the  fact  that  in  Experiment 
Ha  the  interaction  could  be  ascribed  to  the  presence  or  absence  of  spelling 
differences  between  prime  and  target.  The  loss  of  the  interaction  indicates 
that  we  succeeded  in  matching  the  Stimulus  Sets  along  other  relev auit  dimen¬ 
sions. 

Scheff^'s  tests  on  the  effect  of  prime  condition  showed  that  all  three 
primed  conditions  had  shorter  response  times  than  the  unprimed  condition.  In 
addition,  however,  the  exact  repetition  condition  differed  significantly  from 
the  other  priming  conditions  on  both  subjects  and  items  analyses.  This 
statistically  partial  priming  is  the  second  contrast  with  the  outcome  of 
Experiment  Ha. 

In  view  of  the  apparent  effect  of  changing  spelling  between  affixed 
primes  and  targets  in  Experiment  Ha  only,  we  compared  the  outcomes  of  the 
visual  and  auditory  experiments  explicitly.  We  transformed  response  times  to 
difference  scores  by  subtracting  response  times  in  the  BNC  condition  from 
those  in  the  CNC  condition  separately  for  the  Sound  Only  and  Sound  and  Spel¬ 
ling  stimulus  sets.  This  provides  an  estimate  of  the  effects  of  changing 
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pronunciation  alone  (Sound  Only  words)  or  of  changing  both  pronunciation  and 
spelling  (Sound  and  Spelling  words)  between  prime  and  target  with  visual  and 
auditory  presentation.  We  performed  analyses  of  variance  on  the  difference 
scores  with  factors  Experiment  and  Stimulus  Set.  The  effect  of  Stimulus  Set 
(subjects:  F(1,70)  =  M.67,  £  “  *03;  items:  F(1 ,92)  »  3.^9,  £  =  .06)  and  the 
interaction  (subjects:  F(1,70)  =  £*  .03;  items:  F(1,92)  =  3.18,  £» 

.07)  were  significant  in  the  subjects  analysis  and  marginally  significant  in 
the  items  analyses.  Planned  comparisons  on  the  interaction  in  the  subjects 
analysis  showed  that  the  effect  of  a  spelling  difference  was  greater  with 
visual  than  auditory  presentation  {F(1,70)  =  5.10,  £  =  .02);  the  difference 
between  the  modalities  of  presentation  on  the  effect  of  pronunciation  alone 
(Sound  Only)  was  nonsignificant  (F  <  1). 

One  more  analysis  of  the  data  from  each  experiment  was  performed.  To  ask 
whether  a  subject's  ability  to  guess  the  target  frcm  the  prime  accounts  for 
priming  effects,  we  examined  primes  in  the  first  test  block  and  their  repeated 
targets  or  morphologically-related  targets  in  the  second  block  in  Experiments 
i)a  and  Hb. 

In  Experiment  ^a,  across  subjects,  all  items  appeared  as  primes  in  the 
first  block  and  as  targets  in  the  second.  In  Experiment  iJb,  this 
counterbalancing  was  infeasible;  therefore,  just  one  fourth  of  the  items  in 
each  condition  appeared  as  primes  and  targets  In  the  first  two  blocks. 

Restricting  our  analysis  to  the  primes  in  the  first  test  block  and  their 

targets  in  the  second,  in  Experiment  the  effects  of  Priming  Condition  are 
highly  significant  in  both  subjects  and  items  analyses  (subject:  F(3,105)  ■ 

8.89,  £  <  .001;  item:  F(3,138)  =  9.44,  £  <  .001).  The  effect  of  Stimulus  Set 
(Sound  Only,  Sound  and  Spelling)  was  significant  in  the  items  analysis  only; 
the  interaction  did  not  approach  significance  in  either  analysis.  Means  in 

the  four  priming  conditions,  NCI,  NCNC,  BNC,  and  CNC  were  684,  567,  607,  and 
622  collapsed  over  stimulus  sets.  These  times  conform  closely  to  means 

computed  over  all  blocks  presented  in  Table  4.  A  planned  canparison  of  means 
in  the  NCI  (unprimed)  and  CNC  (primed  by  an  irregular  form)  conditions  was 
significant  (subject:  F(1,105)  »  7.23,  £  -  .008;  item:  F(1,138)  -  7.41,  £  - 
.007),  confirming  that  priming  among  regular  and  irregular  affixed  forms  is 
present  even  when  subjects  are  not  aware  that  primes  or  their  morphological 
relatives  will  be  presented  later  in  the  experiment. 

The  same  analysis  performed  on  the  first  two  blocks  of  trials  in  Experi¬ 
ment  4b  gave  essentially  the  same  outcane.  In  that  set  of  analyses,  the  ef¬ 
fect  of  Priming  Condition  was  significant  (subject:  F(3,105)  ■  11.82,  £  < 

.001;  item:  £(3,138)  «  6.76,  £  »  .001).  No  other  factors  were  significant. 
Means  were  787,  695,  753,  and  740  for  NCl ,  NCNC,  BNC,  and  CNC  priming  condi¬ 
tions,  respectively.  A  planned  canparison  of  the  conditions  NCl  and  CNC  was 
significant  (subject;  F(1,105)  -  9.16,  £  <  .001;  item;  F(1,138)  -  7.98,  £  - 
.001  ). 

The  reaction-time  means  and  the  pattern  of  significant  effects  in  these 
restricted  analyses  conform  closely  to  those  obtained  in  the  overall  analyses. 
Thus,  they  confirm  that  repetition  priming  in  the  lexical  decision  paradigm 
does  not  require  a  strategy  of  predicting  targets  from  primes  as  the  primes 
are  presented. 
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Discussion 

We  designed  Experiments  i)a  and  ^(b  to  address  three  questions.  The  first 
was  whether  the  logogen  model,  with  its  paired  acoustic  and  visual  input 
logogens,  was  tenable,  particularly  in  light  of  our  findings  in  Experiment  3 
as  canpared  to  those  of  Kemp  ley  and  Morton  (1982).  In  Experiment  3,  we  found 
that  visually-presented  irregular  words  do  prime  their  unaffixed  relatives 
fully.  In  contrast,  Kempley  and  Morton  (1982)  found  that  auditorily-presented 
jnaffixed  words  and  their  irregular  Inflected  relatives  do  not  prime  each  oth¬ 
er.  In  the  present  study,  we  found  very  similar  priming  in  the  two 
modalities. 

A  second  question  was  whether  we  would  find  evidence  of  asynmetrical  re¬ 
lations  among  morphological  relatives  as  researchers  have  found  for  Ser¬ 
bo-Croatian  (Lukatela  et  al.,  1980).  The  experiment  failed  to  support  an  idea 
that  morphological  relatives  have  a  satellite  organization,  with  the  unaffixed 
base  word  as  the  center  of  the  satellite.  Instead,  with  one  exception,  all 
relationships  among  morphological  relatives  appeared  strong. 

We  did  obtain  one  outcane  suggesting  both  a  difference  between  auditori¬ 
ly-  and  visually-presented  words  in  the  lexicon  and  suggestive  of  a  satellite 
organization  among  orthographically-represented  words.  We  found  that,  with 
visual  presentation,  whereas  base  words  are  primed  essentially  fully  by  af¬ 
fixed  morphological  relatives  not  sharing  either  the  spelling  or  the 

pronunciation  of  the  shared  morpheme  (Experiment  3),  affixed  targets  that  pre¬ 
serve  the  spelling  and  pronunciation  of  the  unaffixed  morpheme  are  not 
(Experiment  Ha).  This  loss  in  priming  apparently  can  be  ascribed  to  the  spel¬ 
ling  difference  between  the  affixed  forms  since  an  analogous  effect  was  not 
obtained  in  the  auditory  version  of  the  experiment  (Experiment  Hb).  Further 
evidence  will  be  needed  to  determine  whether  this  single  outcome  suggestive  of 
different  organizations  for  phonetic  and  orthographic  forms  of  words  is  found 
reliably. 

A  final  question  addressed  by  the  experiments  was  whether  our  procedure 
creates  priming  effects  by  inviting  subjects  to  generate  candidate  targets 

when  primes  are  presented.  We  answered  this  question  in  the  negative  based  on 
two  sources  of  evidence.  First,  priming  occurs  over  lags  of  nearly  50  items 
even  when  the  target  la  not  highly  predictable  from  the  prime.  More  convinc¬ 
ing,  perhaps,  is  the  significant  priming  in  the  first  two  blocks  of  test  tri¬ 
als  in  which  subjects  would  have  no  reason  to  adopt  a  guessing  strategy. 
These  analyses  yielded  mean  response  times  and  patterns  of  significant  effects 
remarkably  similar  to  those  of  the  overall  analyses.  In  particular,  priming 
even  by  irregular  forms  remained  strong  in  analyses  of  both  visually-  and 
auditorily-presented  words.  Therefore,  we  ascribe  the  difference  in  outcome 
between  our  studies  and  that  of  Kempley  and  Morton  either  to  differences  in 
the  items  used  or  to  a  longer  time  lag  between  prime  and  target  in  the  experi¬ 
ment  by  Kempley  and  Morton  (1982).  The  latter  appears  more  likely.  Kempley 
and  Morton  used  inflected  forms  only,  and,  if  there  is  a  difference  in 

strength  of  priming  at  all  between  Inflected  and  derived  forms,  priming  by 

Inflected  forms  should  be  stronger. 
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General  Discussion 

Our  major  findings  can  be  summarized  as  follows.  We  found  that  losses  in 
priming  from  full  to  partial  or  less,  when  exact  repetition  priming  is  com¬ 
pared  with  priming  by  morphological  relatives,  may  be  ascribed  at  least  in 
part  to  episodic  contributions  to  repetition  priming  that  are  larger  the  more 
similar  the  prime  and  target.  By  reducing  the  contribution  of  these  sources 
of  repetition  priming,  we  find  strong  priming — statistically  full  in  most 
cases — among  inflected,  derived  and  unaffixed  words,  and  between  regular  and 
irregular  words,  with  either  auditory  or  visual  presentation.  Accordingly,  if 
repetition  priming  is  interpreted  as  reflecting  lexical  organization  as  we 
assume,  then  our  findings  eliminate  a  theory  of  lexical  organization  in  which 
regular  inflected  forms,  but  not  derived  forms  or  irregular  inflections,  share 
a  lexical  entry  with  the  base.  Correspondingly,  they  eliminate  a  theory  in 
which  the  domain  of  a  lexical  entry  is  Just  those  words  that  can  be  generated 
by  productive,  grammatical  rules  of  affixation  (see  Butterworth,  1983>  for  a 
similar  conclusion). 

Our  findings  invite  either  of  two  extreme  interpretations  previously  con¬ 
trasted  in  the  literature  (e.g.,  Butterworth,  1983).  One  is  that  full  repeti¬ 
tion  priming  (after  Stanners  et  al.,  1979)  or  full  and  partial  priming  (after 
Murrell  and  Morton,  197^1)  reflect  a  lexical  entry  shared  by  primes  and  tar¬ 
gets.  Therefore,  they  signal  that  Inflected,  derived,  regular  and  irregular 
morphological  relatives  share  a  lexical  entry.  This  interpretation  offers  a 
way  of  capturing  the  large  differences  in  longevity  that  have  been  found  be¬ 
tween  repetition  priming  and  semantic  priming  in  the  literature  (cf.  Hender¬ 
son,  198^4).  Whereas  we  have  found  priming  even  when  nearly  50  items  intervene 
between  prime  and  target,  in  studies  of  semantic  priming,  priming  is  absent  by 
a  lag  of  1  or  2  items  (Dannenbring  &  Briand,  1982;  Davelaar  &  Coltheart,  1975; 
Gough,  Alford,  &  Holley-Wilcox,  1981 ;  Meyer,  Schvaneveldt ,  &  Ruddy,  1972;  see 
also  Henderson,  198'!,  for  a  direct  comparison  of  semantic  and  repetition  prim¬ 
ing). 


An  unappealing  consequence  of  adopting  this  interpretation,  however,  is 
that  the  concept  of  lexical  entry  is  severely  weakened.  Entries  that  are  as 
encompassing  as  our  findings  imply  lack  any  obvious  utility  for  the  language 
user.  The  entries  cannot  serve  as  input  to  regular  rules  of  affixation.  In¬ 
deed,  rather  than  consisting  of  the  stem  morpheme,  affixed  by  rule,  each  entry 
perhaps  must  be  considered  a  cluster  of  tightly  associated  affixed  and 
unaffixed  morphological  relatives — a  conceptualization  not  very  distinct  from 
the  second  interpretation  we  will  consider.  A  second  unattractive  property  of 
the  present  interpretation  is  that  each  entry  cannot  be  associated  necessarily 
with  any  semantic  information  at  all  that  is  canmon  to  words  within  the  dcmain 
of  the  entry  (cf.  Aronoff,  1976)  or  to  any  one  syntactic  class.  Moreover,  if 
the  entries  are  logogens,  they  do  not  keep  an  accurate  frequency-based  expect¬ 
ancy  for  all  words  within  the  domain  of  the  entry. 

An  alternative  interpretation  questions  whether  semantic  and  repetition 
priming  are,  in  fact,  qualitatively  distinct.  Possibly,  morphological¬ 
ly-related  words  that  prime  each  other  over  very  long  lags  are  distinct  words 
in  the  lexicon  that  are  strongly  related  semantically.  If  so,  then,  there  are 
no  grounds  for  using  the  priming  effects  as  a  basis  for  inferring  sharing  of 
lexical  entries.  One  advantage  of  this  hypothesis  is  that  Just  one  mechanism, 
not  two,  is  required  to  account  for  priming.  A  second  advantage  is  that  lan¬ 
guage  users  are  not  presumed  to  have  lexical  entries  that  encompass  syntacti¬ 
cally  and  semantically  diverse  morphological  relatives. 
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Along  with  other  researchers  (e.g.,  Henderson,  1984;  Morton,  1981),  how¬ 
ever,  we  are  skeptical  that  morphological  priming  is  exhaustively  semantic. 
For  one  thing,  researchers  attempt  to  use  words  with  the  strongest  associa¬ 
tions  or  the  maximum  semantic  relatedness  when  they  test  for  semantic  priming; 
nevertheless  semantic  priming  does  not  approach  the  longevity  of  repetition 
priming  under  comparable  conditions.  Second,  derived  words  tend  to  drift 
semantically  after  they  are  coined  so  that  their  meaning  is  not  a  simple 
compositional  function  of  the  meaning  of  the  stem  plus  that  of  the  affix 
(Aronoff,  1976);  therefore,  derived  words  tend  to  be  less  semantically  related 
to  morphological  relatives  than  are  inflected  words.  However,  we  obtain 
equally  strong  priming  from  words  of  both  types. 

In  any  case,  it  may  not  be  necessary  to  choose  between  a  view  that 
repetition  priming  reflects  repeated  access  to  an  entry  and  one  that  it  re¬ 
flects  associations  among  words  in  the  lexicon.  A  third  perspective  on  the 
lexicon  may  capture  the  best  features  of  both  of  these  views.  The  perspective 
that  we  propose  is  derived  from  recent  network  models  of  the  lexicon  (e.g., 
Dell,  1980,  1984;  McClelland  &  Rumelhart,  1981;  Stemberger,  1982),  in  particu¬ 
lar  Dell's  model,  which  is  designed  to  produce  speech  and,  in  so  doing,  to 
generate  natural  slips  of  the  tongue.  Dell's  model  provides  a  more  useful 
source  than  the  more  obviously  related  model  by  McClelland  and  Rumelhart 
(1981),  designed  to  generate  aspects  of  word-recognition  behavior,  because 
Dell's  model  includes  a  required  representation  of  morphological  structure. 
His  model  has  not  been  extended  to  orthographic  representations  of  words,  but 
there  are  no  principled  barriers  to  doing  so. 

In  Dell's  network  model,  the  lexicon  is  a  hierarchy  of  levels  of 
representation  including  words,  morphemes,  syllables,  syllable  constituents, 
phonemes,  and  phonetic  features.  Words  such  as  "swimmer"  and  "swimming"  have 
distinct  word  representations  (called  "nodes")  but  connect  to  a  conmon 
stem-morpheme  node  and  from  there  to  common  syllable  and  phoneme  nodes  for  the 
shared  stem  morpheme.  Word  nodes  also  have  connections  to  semantic  memory, 
where,  presumably,  "swimmer"  and  "swimming"  connect  to  conmon  and  to  distinct 
concepts.  A  word  such  as  "swift"  has  distinct  word,  morpheme  and  syllable 
nodes  from  "swimmer,"  but  some  common  phonemes.  Finally,  a  word  such  as 
"drown"  is  unconnected  to  "swimmer"  and  Its  constituents  at  any  level  in  the 
lexicon,  but  shares  concepts  with  it  in  semantic  memory. 

The  structure  of  the  model  is  well-suited,  in  general,  to  explain  our 
pattern  of  findings.  It  gives  morphological  relatives  closer  ties  to  each 
other  (other  things  equal)  than  to  other  words  in  the  lexicon;  yet  it  does  so 
without  either  requiring  morphological  relatives  to  share  a  common  word  node 
or  treating  morphological  relations  as  semantic.  Moreover,  it  can  explain  why 
we  and  others  (Kempley  &  Morton,  1982;  Murrell  &  Morton,  1974;  Stanners  et 
al.,  1979)  consistently  find  numerically  or  even  statistically  weaker  priming 
when  prime  and  target  are  not  exactly  the  same  word  as  when  they  are. 

One  difficulty  with  the  model,  however,  is  that  it  does  not  allow  irregu¬ 
lar  words  such  as  "heal"  and  "health"  to  share  a  morpheme  node  as  it  must  to 
explain  our  priming  in  Experiments  3  and  4.  It  is  prevented  from  doing  so  be¬ 
cause  the  syllable  structure  and  phonemic  constituents  of  a  word  are  elaborat¬ 
ed  at  hierarchical  levels  leading  from  the  morpheme  nodes,  thereby  requiring 
that  morphemes  sharing  a  node  have  the  same  pronunciation.  The  model  could  be 
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adjusted  by  having  the  syllable  level  and  the  levels  below  it  connect  directly 
to  the  word  nodes  and  not  to  the  morpheme  level.  Morphological  structure, 
then,  would  be  a  hierarchical  level  independent  of  levels  of  phonological 
structure.  This  kind  of  separation  may  have  independent  motivation  from  theo¬ 
ries  of  metrical  structure  in  linguistics  (e.g.,  Selkirk,  1980).  However,  it 
remains  to  determine  whether  Dell's  model,  so  modified,  would  produce  natural 
patterns  of  speech  errors  involving  morphological  structure. 

Although  the  structure  of  the  network  model  just  outlined  provides  an 
interesting  alternative  to  both  views  of  the  lexicon  usually  contrasted  In  the 
repetition  priming  literature,  the  processing  assumptions  of  a  network  model 
cannot  handle  repetition  priming  at  the  lags  over  which  we  observe  it.  In 
Dell's  model,  nodes  at  each  hierarchical  level  are  connected  by  bidirectional 
excitatory  lines  of  association.  Activation  of  a  node  is  progressively  incre¬ 
mented  as  activation  spreads  from  it  to  its  associated  nodes  and  back  again. 
To  prevent  every  node  in  the  lexicon  from  being  activated  eventually,  activa¬ 
tion  of  a  node  is  shut  down  once  the  relevant  unit  has  been  output  by  the  sys¬ 
tem  (in  Dell's  model,  once  a  phoneme  or  word  has  been  spoken).  For  a  variety 
of  reasons,  activation  does  tend  to  rebound  after  a  node's  activation  has  been 
shut  down;  this  promotes  perseveration  errors  in  speech  (for  example  [frcm 
Dell,  1980]:  "to  the  bank  to  pick  up  some  money" — "to  the  bank  to  pick  up 
some  bank"),  and  it  may  explain  repetition  priming  of  the  magnitude  and 
longevity  observed  by  Forster  and  Davis  (198i<)  and  by  Napps  (in  preparation) 
when  subjects  are  unaware  of  repetitions  in  the  experiment.  However,  activa¬ 
tion  lasting  for  48  subsequent  items  (or  two  days  as  Scarborough  et  al.,  1977, 
have  observed)  would  have  disastrous  consequences  for  the  model's  normal 
operations.  Evidently,  priming  of  the  longevity  we  observe  is  strategic; 
possibly,  it  can  be  seen  in  the  context  of  the  model  as  strategic  maintenance 
of  activation  of  a  node  previously  activated  by  stimulus  input.  This  strate¬ 
gic  activation  would  play  no  role  in  ordinary  speech  and  reading,  but  can  be 
exploited  as  we  have  done  to  strengthen  repetition  priming  processes  that  re¬ 
veal  the  organization  of  words  in  the  lexicon. 
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Footnotes 

’The  response  times  we  report  here  differ  in  absolute  value  from  times 
reported  in  Napps  and  Fowler  (1983)  and  Napps,  Fowler,  and  Feldman  (1984). 
The  procedures  we  use  to  present  stimuli  to  the  computer-terminal  screen  and 
to  collect  response  times  create  constant  errors.  The  present  response  times 
have  been  adjusted  for  those  constant  errors.  The  times  in  the  earlier 
presentations  were  unadjusted.  The  adjustments  do  not  affect  the  size  in  ms 
of  priming  effects. 

’One  outcome  in  Experiment  1  is  at  apparent  odds  with  the  conclusion  that 
seme  of  the  priming  on  words  is  episodic.  We  would  expect  the  IB  condition  to 
give  rise  to  slightly  longer  response  times  than  the  exact-repetition  (BB) 
condition.  A  small  difference  (7  ms)  in  the  appropriate  direction  does  occur 
in  the  Inflections  and  Derivations  stimuli,  but  it  is  reversed  (-3  ms)  in  the 
Inflections  Only  stimuli.  However,  looking  across  experiments  of  our  own  and 
of  others  in  the  literature  in  which  a  conparison  can  be  made,  in  six  of  eight 
comparisons  IB  exceeds  BB.  The  differences  are  always  small  and  usually 
nonsignificant.  That  they  are  small  is  not  surprising,  however.  Inflections 
and  base  forms  are  orthographically  and  phonologically  very  similar.  Moreo¬ 
ver,  it  is  possible  that  more  lexical  information  than  simply  word  forms  con¬ 
stitutes  an  episodic  trace  in  our  experiments.  Much  of  that  additional  infor¬ 
mation  will  be  the  same  for  inflections  and  base  forms. 

’Another  assessment  of  episodic  priming  in  the  present  experiment  may  be 
obtained  by  comparing  response  times  to  words  in  Experiments  2a  and  b  with 
corresponding  times  in  Experiment  1.  Although  the  mean  response  times  may 
differ  across  the  experiments  due  to  differences  in  lag,  in  subjects,  and,  in 
Experiment  2b,  stimulus  materials,  there  will  be  no  loss  in  episodic  priming 
in  the  B1  condition  of  Experiments  2a  and  b  as  compared  to  Experiment  1  and 
therefore  B1  response  times  should  be  closest  across  the  experiments.  For  the 
same  reason,  DB  conditions  should  show  little  change  when  episodic  priming  is 
eliminated.  The  BB  and  IB  conditions  should  show  a  relative  Increase  in  re¬ 
sponse  time,  however.  With  just  one  notable  exception,  the  outcomes  of 
Experiments  2a  and  b  are  consistent  with  the  predictions.  Conditions  B1  in 
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Experiment  2a  and  B1  and  DB  in  Experiment  2b  show  less  change  from  their  cor¬ 
responding  times  in  Experiment  1  than  (respectively)  conditions  IB  in  Experi¬ 
ment  2a  and  BB  in  Experiment  2b.  The  exceptional  point  is  the  response  time 
to  the  BB  condition  of  Experiment  2a,  vrtiich  is  6  ms  faster  than  in  Experiment 
1  rather  than  being  slower  as  it  should  be.  In  light  of  the  supportive  evi¬ 
dence  provided  by  the  other  conditions  and,  particularly,  by  the  outccxne  on 
nonwords,  we  ascribe  the  one  inconsistency  to  sampling  error  or  perhaps  to  a 
floor  on  response  times  in  the  BB  condition  of  Experiment  1. 

“We  should  acknowledge,  however,  that  although  the  difference  does  not 
approach  significance,  irregular  forms  prime  base  forms  numerically  less  than 
do  regular  forms.  More  generally  in  our  research  using  repetition  priming,  in 
nearly  all  instances  in  which  the  prime  and  target  are  not  identical  and 
repetition  priming  is  statistically  full,  it  is  numerically  less  than  full. 
This  is  the  case  in  moat  conparlaons  in  Experiments  1-4;  similar  trends  can  be 
seen  in  the  findings  of  Stannera  et  al.  (1979)  and  Morton  (Morton,  1981; 
Murrell  &  Morton,  1974). 

*Thls  analysis  assesses  priming  when  subjects  have  no  reason  to  attempt 
to  predict  a  future  target  from  a  prime.  It  remains  true,  however,  that  by 
the  time  the  targets  are  first  presented,  subjects  have  been  exposed  to  a 
large  number  of  morphologlcally-complex  words.  Possibly,  this  promotes  a 
tendency  to  think  of  morphological  relatives  of  primes.  If  it  does,  and  thus 
if  the  set  of  activated  relatives  can  remain  activated  over  lags  of  50  items 
or  more,  this  finding  in  itself  would  be  interesting.  Moreover,  it  would  re¬ 
quire  an  explanation  in  terms  of  activation  within  the  lexicon,  most  probably. 
Both  the  capacity  and  the  tenporal  span  of  any  temporary  buffer  would  be 
exceeded  by  the  memory  demands  required  to  activate  a  set  of  morphological 
relatives  for  each  of  the  two-dozen  primes  presented  within  a  48  item  span. 
In  any  case,  research  by  Napps  (in  preparation)  showing  repetition  priming 
with  very  low  proportions  of  morphological  relatives,  however,  suggests  that 
this  cannot  be  a  major  source  of  repetition-priming  effects. 
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Experiment  1  Base  Words 


enlarge* 

replace* 

yell 

gather 

knead 

pick 

call 

adjust* 

settle* 

attain* 

discern* 

laugh 

sign 

mow 

retain 

rest 

weld 

list 

gash 

govern* 

walk 

equip 

push 

pull 

punish* 

paw 

agree* 

wander 

toss 

develop* 

talk 

deploy 

enchant 

wait 

spell 

enjoy* 

roll 

latch 

command  * 

manage* 

disagree* 

blink 

invent 

paint 

amend 

cook 

pronounce* 

detach* 

•Used  with  both  inflectional  and  derivational  affixes. 
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Experiment  2A  and  2B  Base 

Experiment  2A  Base  Words 

Words 

Experiment  2B  Base  Words 

enlarge 

replace 

develop 

bright 

yell 

gather 

manage 

soft 

knead 

pick 

govern 

eager 

call 

adjust 

assess 

dark 

settle 

attain 

announce 

weak 

discern 

laugh 

emp loy 

stiff 

sigh 

mow 

enjoy 

vague 

retain 

rest 

punish 

complete 

weld 

list 

detach 

direct 

gash 

govern 

disagree 

appropriate 

walk 

equip 

move 

close 

push 

pull 

enforce 

glad 

punish 

paw 

though 

bold 

agree 

wander 

fruit 

blind 

toss 

develop 

help 

fond 

talk 

deploy 

power 

hard 

enchant 

wait 

harm 

awkward 

spell 

enjoy 

care 

fresh 

roll 

latch 

rest 

rich 

conmand 

manage 

color 

like 

disagree 

blink 

fear 

separate 

Invent 

paint 

use 

viv  id 

amend 

cook 

hope 

fair 

pronounce 

detach 

thank 

polite 

_»'•  -  .  '  (*  / 
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Word  Trials  Used  in  Experiments  3  and 

Sound  Only  Sound  and  Spelling 


Base 

No  Change 

Change 

Base 

No  Change 

Change 

heal 

healer 

health 

creep 

creepy 

crept 

sign 

signing 

signal 

defend 

defendant 

defensive 

dream 

dreamer 

dreamt 

sleep 

sleepy 

slept 

edit 

editor 

edition 

repel 

repellent 

repulsive 

deal 

dealing 

dealt 

apeak 

speaker 

spoke 

reside 

resided 

residence 

dec ide 

decided 

decisive 

produce 

producible 

productive 

assume 

assumed 

assumption 

confide 

confided 

confidence 

sweep 

sweeping 

swept 

inhibit 

inhibiting 

inhlb it ion 

invade 

invader 

invasi on 

electric 

electrical 

electrician 

persuade 

persuade 

persuasive 

bomb 

bomber 

bombard 

apace 

spaced 

spatial 

mean 

meaning 

meant 

forget 

forgetful 

forgotten 

grade 

grading 

graduate 

sing 

singer 

sang 

medic 

medical 

medicine 

fall 

falling 

fell 

compare 

comparative 

comparable 

induce 

Inducement 

Induction 

extreme 

extremist 

extremely 

collide 

collided 

collision 

create 

creative 

creature 

describe 

described 

descript  ioi 

drive 

driver 

driven 

concede 

conceded 

concession 

rise 

riser 

risen 

deep 

deeply 

depth 

revise 

revising 

revision 

picture 

picturesque 

pictorial 

music 

musical 

musician 

propel 

propeller 

propulsion 

lyric 

lyrical 

lyricism 

wise 

wisely 

wisdom 

critic 

critical 

criticize 

clear 

clearly 

clarify 

clean 

cleaner 

cleanse 

forgive 

forgiveness 

forgave 

Ml 
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GRAMMATICAL  PRIMING  OF  INFLECTED  NOUNS  BY  THE  GENDER  OF  POSSESSIVE  ADJECTIVES* 

M.  Gurjanov.t  G.  Lukatela.t  Katerina  Lukatela.t  M.  Savid.t  and  M.  T.  Turveytt 


Abstract.  Two  experiments  examined  the  effect  on  lexical  decision 
times  for  inflected  Serbo-Croatian  nouns  when  the  nouns  were  preced¬ 
ed  by  possessive  adjectives  (my,  your,  our).  For  any  given  pairing 
the  possessive  adjective  and  the  noun  always  agreed  in  number  (sin¬ 
gular)  and  case  (nominative)  but  only  agreed  half  of  the  time  in 
gender  (masculine  or  feminine).  Lexical  decisions  were  faster  when 
the  noun  targets  were  of  the  same  gender  as  their  primes.  This 
gender  congruency /incongruency  effect  was  shown  to  hold  whether  the 
inflections  of  the  adjective  and  noun  were  the  same  (as  is  the  case 
for  typical  Serbo-Croatian  nouns)  or  different  (as  is  the  case  for 
atypical  Serbo-Croatian  nouns).  The  results  are  discussed  in  terms 
of  a  post-lexical  influence  of  grammatical  processing  on  the 
recognition  of  individual  words. 

"Priming"  is  a  term  referring  to  the  influence  of  one  stimulus  upon  the 
processing  of  another.  Most  experiments  on  "priming"  with  word  stimuli  have 
considered  words  that  are  assoc iatively  related.  Where  lexical  decision  la¬ 
tency  is  the  measure  of  processing  time  it  has  been  shown  that  processing  is 
more  rapid  when  a  word  is  preceded  by  an  associate  compared  to  when  it  is 
preceded  by  a  nonassociate  (Lupker,  1984).  Recently  other  relations  between 
and  among  words  have  come  under  examination.  Goodman,  McClelland,  and  Gibbs 
(1981)  asked  whether  lexical  decision  is  speeded  when  successive  words  are  in¬ 
stances  of  word  types  that  ordinarily  occur  in  succession  in  the  language. 
These  authors  found  that  when  two  words  were  syntactically  legal  (e.g.,  men 
swear)  the  target  word  was  responded  to  slightly  but  significantly  faster  than 
when  the  two  words  were  syntactically  Illegal  (e.g.,  whose  swear).  Wright  and 
Garrett  (1984)  used  fragments  of  sentences  as  the  priming  context.  They  found 
that  the  grammatical  structure  of  the  incomplete  sentence  affected  the  lexical 
decision  time  for  a  target  word  that  followed  it.  For  example,  modal  verb 
contexts  preceding  main  verb  targets  and  preposition  contexts  preceding  noun 
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targets  yielded  shorter  decision  latencies  than  the  contrary  pairings  (that 
is,  modal/noun  and  preposition/verb). 

English  uses  word  order  as  its  major  syntactical  device.  A  language  like 
Serbo-Croatian  exploits  inflection  as  its  primary  means  of  conveying  grammati¬ 
cal  information.  Experiments  on  syntactic  or  grammatical  priming  in  Ser¬ 
bo-Croatian  have  preserved  the  ordinary  word-type  adjacencies  of  the  language. 
The  grammatical  violations  have  been  introduced  at  the  level  of  inflected 
morphemes.  For  example,  Gurjanov,  Lukatela,  Moskovl jevi6,  Savi6,  and  Turvey 
(1985)  paired  adjectives  and  nouns  in  a  lexical  decision  task.  Grammatical 
agreement  requires  that  the  two  words  be  of  the  same  number,  case,  and  gender. 
This  agreement  is  to  be  found  at  the  level  of  the  inflectional  morphemes  that 
are  suffixed  to  the  adjective  and  noun  stems.  Gurjanov  et  al.  (1985)  violated 
case  agreement  and  found  that  lexical  decision  times  for  the  noun  targets  were 
slower  than  when  the  paired  words  were  in  full  agreement.  In  another  experi¬ 
ment  with  nouns,  Lukatela,  Kosti6,  Feldman,  and  Turvey  (1983)  observed  slower 
decision  times  when  the  noun's  inflection  was  appropriate  for  a  preceding 
preposition  than  when  it  was  inappropriate.  And  in  an  experiment  with  verb 
targets  by  Lukatela,  Moraca,  Stojnov,  Savi6,  Katz,  and  Turvey  (1982),  lexical 
decisions  were  found  to  be  faster  when  the  preceding  personal  pronoun  agreed 
in  person  than  when  it  disagreed  in  person. 

How  are  these  various  instances  of  syntactic  influences  on  lexical  deci¬ 
sion  to  be  understood?  Where  the  context  for  a  target  word  in  the  lexical 
decision  task  is  an  associate,  expediting  lexical  decision  is  often  described 
as  due  to  an  automatic,  intralexical  process.  This  process  is  not  consciously 
directed.  It  is  simply  a  consequence  of  the  way  in  which  the  lexical  memory 
is  organized  (Collins  &  Loftus,  1975;  Forster,  1979).  The  context  mechanical¬ 
ly  increases  the  activation  level  of  the  target's  location  in  memory  prior  to 
the  processing  of  the  target.  This  fast  mechanical  priming  is  generally  said 
to  be  accompanied  by  a  slower,  attentional  priming.  Here  the  idea  is  that  the 
context  can  Induce  a  directing  of  the  focus  of  attention  to  a  particular  re¬ 
gion  of  the  internal  lexicon  (Neely,  1977;  Posner  &  Snyder,  1975).  Following 
a  distinction  suggested  by  Seldenberg,  Tanenhaus,  Lelman,  and  Bienkowski 
(1982),  contexts  that  Include  an  associate  or  semantic  relative  and  that  al¬ 
low,  in  principle,  the  foregoing  priming  processes  are  termed  "priming  con¬ 
texts."  A  priming  context  contrasts  with  the  context  under  investigation  in 
the  present  paper,  namely,  a  minimal  grammatical  context.  A  context  of  this 
latter  type,  referred  to  as  "nonpriming"  by  Seldenberg  et  al,  (1982),  does  not 
appear  to  precipitate  automatic  spreading  activation  (Lukatela  et  al.,  1982). 
The  difference  in  lexical  decision  times  that  accompanies  the  syntactic  con- 
gruency/syntactic  Incongruency  contrast  seems  to  be  due  to  post-lexical  pro¬ 
cesses  rather  than  lexical  processes  (Seldenberg,  Waters,  Sanders,  &  Langer , 
19814).  The  Important  point  to  be  underscored  is  that  lexical  decision  is  a 
complex  operation.  The  accessing  of  the  context's  and  of  the  target's 
representations  in  the  Internal  lexicon  is  but  one  component  process.  Other 
processes  might  include  (1)  recognizing  the  grammatical  relation  between  con¬ 
text  and  target  and  (2)  assigning  a  meaning  to  the  context-target  structure 
(cf.  deGroot,  Thomassen,  &  Hudson,  1982;  Forster,  1979,  1982;  West  &  Stano- 
vich,  1982).  If  these  post-lexical  processes  are  completed  before  the  inter¬ 
nal  deadline  for  emitting  a  lexical  decision,  they  may  influence  positively 
(to  shorten)  or  negatively  (to  lengthen)  the  response  latency  (West  &  Stano- 
vich,  1982). 
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The  present  experiments  extend  the  abovementioned  studies  on  the 
grammatical  priming  of  nouns.  They  examine  the  situation  in  which  nouns  agree 
or  disagree  in  gender  with  the  preceding  word,  a  possessive  adjective  (in  En¬ 
glish,  my,  your,  our,  etc.).  They  also  examine  the  sensitivity  of  the 
nominative  singular  case  to  priming.  The  preposition  priming  study  of  Lukate- 
la  et  al.  (1983)  did  not  address  this  issue  directly  because  the  nominative 
singular  case  of  Serbo-Croatian  noun  is  not  governed  by  a  preposition.  The 
study  by  Gurjanov  et  al.  (1985)  did  address  this  issue  directly  and  yielded  a 
negative  result:  decision  times  for  nouns  in  the  nominative  singular  case 
were  unaffected  by  case  agreement  with  preceding  adjectives.  This  issue  of 
the  priming  sensitivity  of  the  nominative  singular  case  of  nouns  is  important 
given  the  demonstration  that  this  case  plays  a  central  role  in  the  organiza¬ 
tion  of  the  inflected  forms  of  a  noun  in  the  internal  lexicon  (Lukatela, 
Gligori jevid,  Kosti6,  &  Turvey,  1980).  Although  the  various  cases  occur  with 
different  frequencies,  the  evidence  suggests  that  speed  of  lexical  access  is 
indifferent  to  case  frequency.  The  nominative  singular  is  accessed  fastest 
with  the  different  oblique  cases  accessed  at  roughly  the  same  speed. 

The  question  posed  is  whether  the  privileged  lexical  status  of  the 
nominative  singular  is  associated  with  a  general  insensitivity  to  grammatic 
context.  Is  it  possible  that  case  agreement  and  gender  agreement  are  not  of 
equal  significance?  If  they  are  not  then  failure  to  find  an  effect  of  agree¬ 
ment  in  case  (Gurjanov  et  al.,  1985)  may  not  extend  to  agreement  in  gender. 
To  anticipate,  the  experimental  outcome  is  that  gender  agreement  does  affect 
the  processing  of  nouns  in  the  nominative  singular. 

Experiment  1 


The  lexical  decision  time  for  any  given  target  noun  in  the  nomina'’ive 
singular  form  was  measured  in  two  contexts — one  in  which  it  was  preceded  by  a 
possessive  adjective  in  the  nominative  singular  form  and  one  in  which  it  was 
preceded  by  a  visually  similar  pseudopossessl ve  adjective.  For  one  half  of 
the  noun  targets  the  possessive  adjective  agreed  in  gender.  It  was  expected 
that  if  gender  agreement  Influenced  the  processing  of  nominative  singular  noun 
forms,  then  gender  agreement  would  result  in  faster  decisions  than  gender 
disagreement . 

The  majority  of  Serbo-Croatian  masculine  nouns  in  the  nominative  singular 
case  end  in  a  consonant.  In  comparison,  the  majority  of  feminine  nouns  in  the 
nominative  singular  end  in  A  and  the  majority  of  neuter  nouns  end  in  either  0 
or  E.  Some  masculine  nouns  in  the  nominative  singular,  however,  end  in  A. 
There  are  some  feminine  nouns  in  the  nominative  singular  that  end  in  a  conso¬ 
nant.  In  the  first  experiment  only  typical  masculine  and  feminine  nouns  were 
used.  (In  the  second  experiment  both  the  typical  and  atypical  types  are  exam¬ 
ined.  ) 

Method 

Subjects.  Nineteen  students  from  the  Department  of  Psychology,  Universi¬ 
ty  of  Belgrade,  received  academic  credit  for  participation  in  the  experiment. 


•  * '  ' 
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Materials.  Letter  strings  of  upper  case  letters  were  typed  with  an  IBM 
Selectric  Typewriter.  The  letter  strings  were  used  to  prepare  black  on  white 
slides . 
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Two  types  of  slides  were  constructed.  In  one  type,  the  letter  string  was 
arranged  horizontally  in  the  upper  half  of  a  35  mm  slide  and,  in  the  other 
type,  letters  of  the  same  kind  were  arranged  horizontally  in  the  lower  half  of 
a  35  mm  slide.  Letter  strings  in  the  first  type  of  slides  were  always 
possessive  adjectives  in  nominative  singular  form  (or  their  pseudo-word 
analogues),  and  letter  strings  in  the  second  type  of  slide  were  always  ordi¬ 
nary  nouns  in  nominative  singular  form  (or  their  pseudoword  analogues). 
Altogether,  there  were  144  "possessive  adjective"  stimuli  and  144  "noun"  sti¬ 
muli  with  each  set  evenly  divided  into  words  and  pseudowords. 

The  36  nouns  were  selected  from  the  middle  frequency  range  of  a  corpus  of 
one  million  Serbo-Croatian  words  (Kosti6,  1965).  Half  of  the  nouns  were 
masculine  and  half  of  the  nouns  were  feminine.  A  different  set  of  36  nouns 
(18  masculine  and  18  feminine)  of  the  same  frequency  was  used  to  generate  the 
pseudonouns.  This  was  done  by  simply  changing  one  letter  in  the  root 
morpheme.  The  replacement  was  an  orthotactically  and  phonotactically  legal 
letter.  Importantly,  all  "nouns"  (words  and  pseudowords)  were  five  letters  in 
length  and  consisted  of  two  syllables.  Thirty-six  possessive  adjective  stimu¬ 
li  were  possessive  adjectives  in  the  nominative  singular  form  of  the  masculine 
gender:  twelve  were  the  first  person  singular  (MOJ  =  my);  twelve  were  the 
second  person  singular  (TVOJ  -  thy);  and  twelve  were  the  first  person  plural 
(NAS  -  our).  The  other  36  possessive  adjective  stimuli  were  the  same 
possessive  adjectives  in  the  same  case  and  in  the  same  proportion  but  of  the 
feminine  gender  (MOJA,  TVOJA,  and  NASA).  In  addition  to  these  72  possessive 
adjective  stimuli  another  72  "possessive  adjective"  stimuli  were  constructed 
with  the  pseudoword  analogues  of  the  three  masculine  and  feminine  possessive 
adjectives,  namely,  MEJ,  TLOJ,  LAS,  MEJA,  TLOJA,  LASA. 

In  total,  a  subject  was  presented  144  pairs  of  stimuli  in  the  experimen¬ 
tal  session.  Sixteen  other  different  pairs  of  stimuli  were  used  for  the 
preliminary  training  of  subjects. 

Design.  Each  noun  was  presented  two  times  to  a  given  subject.  On  the 
two  occasions  a  noun  was  presented,  it  was  preceded  by  a  possessive  adjective 
on  one  occasion  and  by  a  pseudopossessive  adjective  on  the  other  occasion. 
Importantly,  between  the  first  and  second  presentation  of  a  given  noun  there 
were  always  71  presentations  of  other  pairs.  This  constraint  on  the  design  of 
the  experiment  meant  that  the  36  nouns  and  the  36  pseudonouns  that  were  ex¬ 
posed  in  a  pseudorandom  order  in  the  first  half  of  each  experimental  session 
were  exposed  in  the  same  order  in  the  second  half  of  the  session.  However, 
the  priming  stimuli  in  the  first  and  second  half  of  the  session  were  mutually 
interchanged.  Those  nouns  and  pseudonouns,  which  in  the  first  half  of  the 
session  were  preceded  by  possessive  adjectives,  were  preceded  in  the  second 
half  by  the  corresponding  pseudopossessive  adjectives,  and  vice  versa.  Hence, 
a  given  subject  never  experienced  a  given  pair  of  stimuli  more  than  once. 

As  noted,  for  any  given  subject  a  target  noun  appeared  only  twice  with 
one  appearance  preceded  by  a  pseudopossessive  adjective.  The  other  appearance 
was  preceded  by  a  possessive  adjective.  The  possessive  adjective  context 
could  either  agree  or  disagree  in  gender  with  the  noun.  That  is,  if  the  noun 
were  masculine,  then  the  preceding  possessive  adjective  could  be  either  mascu¬ 
line  or  feminine.  Consequently,  for  a  given  subject,  the  nouns  that  occurred 
in  an  appropriate  possessive  adjective  context  were  different  from  the  nouns 
that  occurred  in  an  Inappropriate  possessive  adjective  context.  In  summariz¬ 
ing  the  data  in  Table  1  the  fact  that  different  word  sets  comprised  the  appro- 
2oe 
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priate  and  inappropriate  pairings  is  marked  by  the  use  of  two  exemplary  mascu¬ 
line  nouns,  LONAC  and  SAPUN,  and  two  exemplary  feminine  nouns,  TABLA  and  PTI- 
CA.  There  is  a  further  feature  of  the  design  to  be  remarked  upon.  If  a  tar¬ 
get  noun,  say,  LONAC,  was  preceded  by  a  possessive  adjective  of  the  proper 
gender,  say,  MOJ ,  on  one  of  its  appearances,  then  it  was  preceded  by  a  visual¬ 
ly  similar  pseudopossessi ve  adjective,  say,  MEJ,  on  the  other  appearance. 
Similarly  if  LONAC  was  preceded  by  the  inappropriate  context  MOJA  on  one 
appearance,  it  was  preceded  by  the  pseudopossessi ve  adjective  MEJA  on  the  oth¬ 
er.  The  design  therefore  permitted  the  direct  comparison  within  a  subject  of 
lexical  decision  times  to  the  same  word  in  two  different  contexts--one  in 
which  the  prime  agreed  or  disagreed  grammatically  and  one  in  which  the  prime 
was  a  pseudoword. 

To  reiterate,  a  given  subject  saw  lUH  different  pairs  of  stimuli:  one 
quarter  of  the  144  trials  consisted  of  possessive  adjective- noun  pairs  (half 
of  which  agreed  and  half  of  which  disagreed  in  gender),  one  quarter  consisted 
of  pseudo  possessive  adjective-noun  pairs,  one  quarter  consisted  of  possessive 
adjective-pseudonoun  pairs,  and  one  quarter  consisted  of  pseudopossessi ve 
adjecti ves-pseudonoun  pairs.  The  presentation  order  was  pseudorandom. 

Procedure.  On  each  trial,  two  slides  were  presented.  The  subjects’  task 
was  to  decide  as  rapidly  as  possible  whether  the  letter  string  contained  in  a 
slide  was  a  word.  Each  slide  was  exposed  in  one  channel  of  a  three-channel 
tachistoscope  (Scientific  prototype  model  GB)  illuminated  at  10.3  cd/m*.  Both 
hands  were  used  in  responding  to  the  stimuli.  Both  thumbs  were  placed  on  a 
telegraph  key  close  to  the  subject  and  both  forefingers  on  another  telegraph 
key  two  inches  further  away.  The  closer  key  was  depressed  for  a  "no”  response 
(the  string  of  letters  was  not  a  word);  and  the  farther  key  was  depressed  for 
a  "yes”  response  (the  string  of  letters  was  a  word). 

Latency  was  measured  from  the  onset  of  a  slide.  The  subject's  response 
to  the  first  slide  terminated  its  duration  and  Initiated  the  second  slide  (at 
effectively  a  delay  of  0  ms)  unless  the  latency  exceeded  1300  ms  in  which  case 
the  second  slide  was  initiated  automatically.  The  duration  of  the  second 
slide,  unlike  that  of  the  first,  was  fixed  at  1300  ms. 

Results 

A  mean  reaction  time  was  computed  for  each  subject  on  each  type  of  word 
pair.  Latencies  shorter  than  300  ms  and  longer  than  1300  ms  were  excluded  as 
were  latencies  associated  with  incorrect  responses.  The  total  exclusions  did 
not  exceed  1.4  percent  of  all  responses.  The  mean  latencies  for  the  primes, 
namely,  masculine  possessive  adjective  (e.g.,  MOJ),  feminine  possessive 
adjective  (e.g.,  MOJA),  pseudo  masculine  possessive  adjective  (e.g.,  MEJ),  and 
pseudo  feminine  possessive  adjective  (e.g.,  MEJA)  were:  542  ms,  543  ms,  638 
ms,  and  637  ms,  respectively. 

Because  of  the  design  of  the  experiment,  a  subject  saw  any  given  mascu¬ 
line  noun  in  the  nominative  singular,  for  example,  LONAC,  preceded  once  by  a 
masculine  possessive  adjective  in  nominative  singular,  for  example,  MOJ,  and 
preceded  once  by  a  mutated  version  of  that  same  masculine  possessive 
adjective,  viz.,  MEJ.  Likewise,  the  subject  saw  any  given  feminine  noun  in 
the  nominative  singular,  for  example,  PTICA,  preceded  once  by  MOJA  and  once  by 
MEJA.  The  same  arrangement  was  true  for  the  incongruent  pairings:  MOJA  SAPUN 
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with  MEJA  SAPUN  for  a  masculine  noun,  and  MOJ  TABLA  with  MEJ  TABLA  for  a 
feminine  noun.  These  relations  and  comparisons  are  displayed  in  Table  1. 


Table  1 


Lexical  Decision  Times  for  Examples  of  Masculine  and  Feminine  Nouns  Primed  by 
Real  and  Pseudopossessi ve  Adjectives 

Noun  gender 


Type  of  prime 

Prime  inflection 

masculine 

feminine 

Masculine  («5) 

possessive  adjective 

608±41® 

(LONAC)^^ 

665+39 

(PTICA) 

Feminine  (A) 

672 ±2 7 
(SAPUN) 

593±36 

(TABLA) 

pseudoadjective 

Masculine  (<)) 

possessive 

653±40 

(LONAC) 

640+36 

(PTICA) 

Feminine  (A) 

623±42 

(SAPUN) 

614±27 

(TABLA) 

^mean  reaction  time  and  standard  deviation 
‘^example  of  noun 


Only  effects  that  were  significant  by  both  the  analysis  based  on  subject 
means  and  the  analysis  based  on  item  means  are  reported.  The  question  of  ma¬ 
jor  interest  is  whether  lexical  decision  times  were  affected  by  the  grammati¬ 
cal  relation  between  the  prime  and  the  target.  This  effect,  if  it  exists, 
should  be  found  In  the  two-way  interaction  between  target  gender  and  prime 
inflection  and  the  three-way  interaction  among  target  gender,  prime  inflec¬ 
tion,  and  lexicality.  Both  interactions  proved  to  be  significant:  F(1,18)  « 
74.93,  MSe  -  641  ,  p  <  .001  and  F(1,l8)  -  52,43,  MSe  -  877,  £  <  .001  by  the 
subject  analysis;  and  F{1,32)  -  17.18,  MSe  -  19220,  £  <  .001  and  F(1,32)  - 
15.68,  MSe  »  21794,  £  <  .001  by  the  item  analysis.  Also  significant  was  the 
main  effect  of  prime  inflection:  F(1,18)  -  19.79,  MSe  -  291,  £  <  .001  and 
F(1,32)  -  4,10,  MSe  -  4591  ,  £  <  .05  by  the  subjects  and  items  analyses, 
respectively.  On  the  average,  lexical  decisions  following  the  uninflected 
primes  (e.g.,  MOJ,  MEJ)  were  slower  than  those  following  the  inflected  primes 
(e.g.,  MOJA,  MEJA):  642  ms  versus  625  ms. 

The  analysis  supports  the  hypothesis  that  lexical  decision  on  a  noun  in 
the  minimal  grammatical  context  provided  by  a  possessive  adjective  depends  on 
whether  or  not  the  noun  and  possessive  adjective  agree  in  gender.  For  mascu¬ 
line  nouns  the  difference  between  the  inappropriate  pairing  and  the  appropri¬ 
ate  pairing  was  64  ms;  for  feminine  nouns  it  was  72  ms.  These  magnitudes  are 
considerably  larger  than  the  inappropriate-appropriate  difference  reported  by 
Goodman  et  al .  (1983).  Comparisons  of  English  word  sequences  such  as  "men 
swears"  (appropriate)  and  "whose  swears"  (Inappropriate)  yielded  small  differ¬ 
ences  of  19  ms  (Experiment  1)  and  13  ms  (Experiment  2). 

210 


Gurjanov  et  al:  Grammatical  Priming  of  Inflected  Nouns 


The  grammatical  congruent-grammatical  incongruent  contrast  is  a  reliable 
measure  of  grammatical  priming.  Less  reliable  but  of  larger  theoretical  im¬ 
portance  is  the  measure  of  grammatical  priming  that  divides  the  congruency  ef¬ 
fect  into  facilitative  and  inhibitory  components.  This  division  rests  on  the 
availability  of  a  suitable  baseline.  In  the  present  experiment  nouns  follow¬ 
ing  pseudowords  provide  the  baseline.  What  is  missing,  however,  is  an  inde¬ 
pendent  evaluation  of  the  effect  of  pseudowords  on  lexical  decision.  Another 
weakness  of  the  current  baseline  is  that  a  pseudopossessive  adjective-noun  se¬ 
quence  involves  a  negative  response  followed  by  a  positive  response,  raising 
the  possibility  of  an  inhibitory  influence  on  the  noun  decision-making  proc¬ 
ess.  The  analysis  that  follows  should  be  interpreted  with  these  caveats  in 
mind. 


As  noted  above,  because  of  the  design  of  the  experiment  it  is  possible  to 
make  a  within-sub ject  comparison  of  a  noun  with  itself  in  two  different  con¬ 
texts,  namely,  those  of  possessive  adjective  and  pseudopossessive  adjective. 
Facilitation  of  lexical  decision  is  here  defined  operationally  by  a  signif¬ 
icant  positive  difference  between  pairs  of  type  MOJ  LONAC  (congruent  prime) 
and  MEJ  LONAC  (nonsense  prime)  or  MOJA  PTICA  (congruent  prime)  and  MEJA  PTICA 
(nonsense  prime),  and  inhibition  of  lexical  decision  is  defined  by  a  signif¬ 
icant  negative  difference  between  pairs  of  type  MOJA  SAPUN  (incongruent  prime) 
and  MEJA  SAPUN  (nonsense  prime)  or  MOJ  TABLE  (incongruent  prime)  and  MEJ  TABLA 
(nonsense  prime).  Protected  t-tests  (Cohen  &  Cohen,  1975;  the  error  term  from 
the  ANOVA  is  used  as  the  estimate  of  the  variance)  on  subject  means  revealed 
that  there  was  facilitation;  ^(18)  =  4.79,  £  <  .001  and  _t(l8)  =  2.29,  £  <  .05 
for  the  masculine  (LONAC)  and  feminine  (PTICA)  situations  respectively:  and 
that  there  was  inhibition:  £(18)  ■  4.49,  £  <  .001  and  £(18)  -  2.50,  £  <  .05 
for  the  masculine  (SAPUN)  "and  feminine  (TABLA)  situations,  respectively. 
These  outcomes  were  nearly  corroborated  in  full  by  protected  t-tests  on  item 
means:  t(32)  -  3.49,  £  <  .001  and  t(32)  -  1.75,  £  <  .05  for  the  masculine 
(LONAC)  ~and  feminine  (PTICA)  situations,  respectively;  t(32)  =  3.72,  £  <  .001 
and  t(32)  =■  1.59,  £  >  .05  for  the  masculine  (SAPUN)  and  feminine  (TABLA) 

situations,  respectively. 

An  ANOVA  conducted  on  the  pseudonoun  data  revealed  no  main  effects  or 
interactions. 


Experiment  2 

The  inflectional  morphemes  of  a  masculine  possessive  adjective  in 
nominative  singular  and  a  typical  masculine  noun  in  nominative  singular  are 
identical,  viz,,  6.  Similarly,  the  inflectional  morphemes  of  a  feminine 
possessive  adjective  in  nominative  singular  and  a  typical  feminine  noun  in 

nominative  singular  are  identical,  viz..  A,  The  second  experiment  examines 

the  contribution  of  this  identity  in  inflectional  morphemes  to  the  gender  con¬ 
gruency/incongruency  effect  observed  in  Experiment  1. 

As  noted  above,  there  are  (very  few)  masculine  nouns  that  end  in  A  in  the 

nominative  singular  and  (relatively  more)  feminine  nouns  that  end  in  6  in  the 

nominative  singular.  It  is  possible,  therefore,  to  have  a  possessive 
adjective  and  noun  that  agree  in  nominative  singular  case  and  in  gender  but 
that  do  not  share  the  same  inflected  ending,  for  example,  MOJ  DEDA  (my  grand¬ 
father),  where  both  words  are  masculine  nominative  singular,  and  MOJA  MATER 
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(my  mother),  where  both  words  are  feminine  nominative  singular.  The  second 
experiment  exploits  pairs  of  the  preceding  kind  along  with  pairs  constructed, 
as  before,  from  typical  masculine  and  feminine  nouns,  for  example,  MOJ  LONAC 
and  MOJA  PTICA.  If  the  gender  congruency/ incongruency  effect  is  not  tied  to 
the  visual  or  linguistic  identity  of  the  prime  and  target  suffixes,  then  the 
effect  should  hold  for  possessive  adjective-noun  pairs  constructed  with  atypi¬ 
cal  nouns  as  it  does  for  such  pairs  constructed  with  typical  nouns.  If  MOJ 
LONAC  is  faster  than  MOJA  LONAC,  then  MOJ  DEDA  should  be  faster  than  MOJA  DE- 
DA.  The  latter  observation  would  rule  out  the  hypothesis  that  the  effect  ob¬ 
tained  in  the  first  experiment  was  due  to  dimensions  of  visual  similarity 
rather  than  grammatical  similarity. 


The  design  of  the  second  experiment  differed  from  that  of  the  first.  In 
the  second  experiment,  unlike  the  first,  no  noun  or  pseudonoun  target  was  re¬ 
peated  in  the  sequence  of  prime- target  pairs  seen  by  a  subject.  In  the  second 
experiment,  unlike  the  first,  the  nouns  preceded  by  congruent  possessive 
adjectives  were  also  the  nouns  preceded  by  incongruent  possessive  adjectives. 
This  was  achieved  by  a  between-subjects  manipulation.  Where  one  group  of  sub¬ 
jects  saw  a  given  noun  preceded  by  a  grammatically  appropriate  prime,  another 
group  of  subjects  saw  the  same  noun  preceded  by  a  grammatically  inappropriate 
prime.  The  analysis  of  the  experiment  focuses  on  the  grammatical  congruen¬ 
cy/grammatical  incongruency  effect.  What  few  merits  the  analysis  into  facili¬ 
tation  and  inhibition  effects  might  have  had  in  the  first  experiment,  given 
its  wi thin-subject  comparison  of  a  target  noun  preceded  by  a  word  prime  and  a 
pseudoword  prime,  were  reduced  further  by  the  between-subject  design  of  the 
second  experiment.  Consequently,  no  attempts  were  made  in  the  second  experi¬ 
ment  to  quantify  facilitation  and  inhibition. 


Method 

Subjects.  Fifty-two  students  from  the  Department  of  Psychology,  Univer¬ 
sity  of  Belgrade,  received  academic  credit  for  participation  in  the  experi¬ 
ment.  A  subject  was  assigned  to  one  of  four  subgroups  according  to  the  sub¬ 
jects'  appearance  at  the  Laboratory,  for  a  total  of  thirteen  subjects  per 
subgroup.  None  of  the  subjects  had  participated  in  Experiment  1. 


mm 


Materials.  The  stimuli  were  of  the  same  physical  appearance  as  in 
Experiment  1,  Altogether,  128  "possessive  adjective"  stimuli  and  128  "noun" 
stimuli  were  constructed,  with  each  set  evenly  divided  into  words  and  pseudo¬ 
words.  The  61)  real  possessive  adjective  stimuli  represented  the  possessive 
adjectives  MOJ,  MOJA,  (my)  and  TVOJ ,  TV OJA  (your).  The  6‘i  pseudopossessi  ve 
adjective  stimuli  were  derived  from  the  possessive  adjectives  by  replacement 
of  a  consonant  or  a  vowel  (MEJ,  MEJA,  MOS,  MOSA,  FOJ ,  FOJA ,  KVOJ ,  KVOJA,  TVOK, 
TVOKA,  TVEJ,  TVEJA). 

Thirty-two  of  the  nouns  in  Experiment  2  were  similar  to  those  used  in 
Experiment  1  —  there  were  16  typical  masculine  nouns  and  16  typical  feminine 
nouns.  In  comparison  to  Experiment  1  an  additional  set  of  32  atypical  nouns 
was  also  used:  16  masculine  nouns  ending  in  the  vowel  A  and  16  feminine  nouns 
ending  in  a  consonant.  The  6^1  pseudonouns  were  generated  from  these  typical 
and  atypical  nouns  by  replacing  the  initial  or  middle  consonant  by  another 
consonant  of  same  phonemic  class.  Consequently,  32  pseudonouns  ended  in  a 
consonant  and  32  pseudonouns  ended  in  A. 


V  s.' 

Wv-' 


V-V-'.N 

-  •  *  •  •  • 
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In  total,  there  were  512  different  pairs  of  stimuli  of  which  a  given  sub¬ 
ject  saw  128  pairs.  Thirty-two  other  pairs  of  stimuli  were  used  for  the 
preliminary  training  of  subjects. 

Design.  The  constraint  of  the  design  of  the  experiment  was  that  a  given 
subject  never  experienced  a  given  noun  or  pseudonoun  more  than  once. 

As  mentioned,  a  given  subject  saw  128  different  pairs  of  stimuli.  Each 
subject  saw  the  same  nouns  and  pseudonouns  as  every  other  subject  but  not 
preceded  by  the  same  possessive  adjective  or  pseudopossessi ve  adjective  type. 
Consider,  for  example,  the  masculine  noun  LONAC.  In  one  group  of  subjects 
this  noun  was  preceded  by  a  possessive  adjective  in  the  same  case,  number,  and 
gender  (e.g.,  MO J ) ;  in  a  second  group  it  was  preceded  by  a  possessive 
adjective  of  the  same  case  and  number  but  of  a  different  gender  (e.g.,  MOJA); 
in  a  third  group  it  was  preceded  by  a  pseudoword  visually  similar  to  the  con¬ 
gruent  prime  (e.g.,  MEJ  or  MOJ  or  FOJ ) ;  and  in  a  fourth  group  it  was  preceded 
by  a  pseudword  visually  similar  to  the  incongruent  prime  (e.g.,  MEJA  or  MOJA 
or  FOJA).  In  one  half  of  the  128  trials  the  second  stimulus  in  a  pair  was  a 
noun,  and  in  the  other  half  the  second  stimulus  was  a  pseudonoun.  In  one  half 
of  the  32  possessive  adjective-noun  trials  a  given  subject  saw  8  typical 
masculine  and  8  typical  feminine  nouns.  There  was  a  similar  division  for  the 
32  pseudopossessi ve  adjective- noun  trials,  the  32  possessive  adjective-pseudo¬ 
noun  trials,  and  the  32  pseudopossessi ve  adjective-pseudonoun  trials.  Within 
each  combination  gender-congruent  possessive  adjectives  and  gender- incongruent 
possessive  adjectives  appeared  equally  often. 

Procedure.  The  procedure  was  the  same  as  in  Experiment  1 , 

Results 

A  mean  reaction  time  was  computed  for  each  subject  in  each  of  the  four 
groups.  The  criteria  for  excluding  responses  were  the  same  as  in  Experiment 
1.  Approximately  3.5  percent  of  all  responses  were  excluded  from  the  analyses 
by  these  criteria. 

The  first  question  to  be  addressed  is  whether  the  results  of  the  first 
experiment  which  were  obtained  with  typical  masculine  and  feminine  nouns  were 
replicated  in  the  second  experiment.  Table  2  presents  the  data  for  typical 
masculine  and  feminine  nouns  as  a  function  of  prime  lexicality  and  prime 
inflection.  A  group  x  prime  lexicality  x  target  gender  x  prime  Inflection 
analysis  of  variance  suggests  that  the  outcome  of  Experiment  2  was  very  simi¬ 
lar  to  that  of  Experiment  1:  Target  gender  was  significant,  F(1,48)  »  15.69, 
MSe  »  2610,  p  <  .001;  target  gender  by  prime  inflection  was  significant, 
F(1,ii8)  =  20.53,  MSe  =  i453^i  P  <  .001;  and  target  gender  by  prime  inflection 
by  prime  lexicality  was  significant,  F(1,U8)  =  30.117,  MSe  =  2232,  p  <  .001. 
Although  the  main  effect  of  groups  was  not  significant,  there  were  significant 
interactions  involving  groups;  group  by  prime  Inflection,  F(3,il8)  =  13.66, 
MSe  =  2222,  p  <  .001;  group  by  prime  lexicality,  F(3|il8)  »  5.57,  MSe  =•  5670, 

£  <  .01;  group  by  prime  inflection  by  prime  lexicality,  F(3,ii8)  =  11. 30, 
MSe  =  1958,  p  <  .001;  and  the  four  way  interaction.  These  interactions 
identify  the  differences  in  the  pairs  of  stimuli  assigned  to  the  groups. 

As  with  Experiment  1  it  can  be  claimed  that  lexical  decision  times  for 
target  nouns  of  the  typical  type  depended  on  whether  the  inflected  ending  of 
the  prime  was  consistent  with  the  gender  of  the  noun.  This  dependency  is 
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Table  2 

Lexical  Decision  Times  and  Error  Rates  for  Typical  Masculine  and  Feminine 
Nouns  as  a  Function  of  Prime  Lexicality  and  Prime  Inflection 

(typical)  Noun  gender 


Type  of  prime 

Prime  inflection 

masculine  (.6) 

feminine  (A) 

Masculine  (6) 

657+93^ 

687±92 

1 

2.4 

possessive  adjective 

Feminine  (A) 

717+1 12 

636+79 

H.8 

0.50 

Masculine  («5) 

670+84 

661 ±91 

5.8 

1  .4 

pseudo  adjective 

possessive 

Feminine  (A) 

666+80 

647+73 

4.3 

1.9 

l^mean  reaction  time  and  standard  deviation 
“percentage  of  responses  that  were  incorrect 


Table  3 

Lexical  Decision  Times  and  Error  Rates  for  Atypical  Masculine  and  Feminine 
Nouns  as  a  Function  of  Prime  Lexicality  and  Prime  Inflection 


(atypical)  Noun  gender 


Type  of  prime 

Prime  inflection 

masculine  (A) 

feminine  {6) 

Masculine  (0) 

712+107^ 

692+108 

5.8“ 

4.8 

possessive  adjective 

Feminine  (A) 

734±102 

647±86 

7.7 

1 .4 

Masculine  («5) 

723±104 

675±75 

8.2 

5.3 

pseudo  possessive 

adjective 

Feminine  (A) 

730+99 

652 ±69 

8.7 

2.4 

^mean  reaction  time  and  standard  deviation 
percentage  of  responses  that  were  incorrect 
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greater  for  word-word  pairs  than  for  pseudoword-word  pairs.  Protected  t-tests 
confirmed  the  difference  between  congruent  word-word  pairs  and  incongruent 
word-word  pairs  for  the  masculine  nouns,  ^(^48)  =  6.^9,  £  <  .001  and  between 
congruent  word-word  pairs  and  incongruent  word-word  pairs  for  the  feminine 
nouns,  t(48)  =  5.52,  p  <  .001.  However,  neither  the  masculine  nor  the 
feminine  comparison  was  significant  for  the  pseudoword-word  pairs. 

Is  the  gender  congruency/ incongruency  effect  exhibited  by  possessive 
adjective-noun  pairs  constructed  with  atypical  nouns?  Table  3  presents  the 
data  for  the  atypical  masculine  and  feminine  nouns  as  a  function  of  prime 
lexicality  and  prime  inflection.  Comparison  of  Table  3  with  Table  2  suggests 
a  similar,  though  not  identical,  pattern  of  results.  An  analysis  of  variance 
conducted  over  the  combinations  of  groups,  prime  lexicality,  target  gender, 
and  prime  inflection  yielded  significant  effects  for  target  gender,  F{1,48)  = 
99.87,  MSe  =  3495,  £  <  .001  and  for  the  interaction  of  target  gender  with 
prime  inflection,  F(1,48)  =  21.68,  MSe  =  2869,  £  <  .001.  There  was  no  main 
effect  of  groups  but  all  the  interactions  with  group  were  significant,  as 
above.  Like  typical  nouns,  atypical  nouns  exhibit  a  gender  congruency/ incon¬ 
gruency  effect  but,  unlike  typical  nouns,  the  magnitude  of  the  effect  is  less 
dependent  on  the  lexicality  of  the  prime. 

It  is  noteworthy  that  there  was  a  large  difference  in  errors  between 
atypical  masculine  nouns  (more)  and  atypical  feminine  nouns  (less),  F(1,48)  = 
11.92,  £  <  .001  and  that  the  errors  committed  on  these  two  noun  types  depended 
differently  on  the  Inflection  of  the  preceding  prime,  F(1,48)  =  4.44,  £  <  .05. 
The  same  analysis  on  the  typical  nouns  revealed  that  the  masculine  nouns  were 
again  the  source  of  most  errors,  F(1,48)  =  7.65,  £  <  .01,  but  that  there  was 
no  interaction  of  target  gender  with  prime  inflection.  Overall,  the  errors 
for  both  analyses  follow  the  pattern  of  the  decision  latencies  (compare  Tables 
2  and  3)  but  it  is  not  obvious  why,  in  all  analyses  (Experiment  1  and  Experi¬ 
ment  2),  latencies  are  longer  on  average  and  errors  are  greater  on  average  for 
masculine  nouns. 

The  third  question  is  whether  the  gender  congruency/ incongruency  effect 
differs  between  typical  and  atypical  masculine  nouns.  The  number  of  masculine 
nouns  that  end  in  A  is  very  small,  as  noted,  and  the  number  of  nouns  in  this 
category  used  in  the  experiment  almost  exhausts  the  category.  By  and  large, 
masculine  nouns  inflected  with  A  in  the  nominative  singular  occur  less 
frequently  than  masculine  nouns  inflected  with  A  in  the  nominative  singular. 
A  group  X  prime  lexicality  x  prime  inflection  x  target  inflection  (typical 
vs.  atypical  type)  analysis  of  variance  was  conducted.  The  main  effect  of 
prime  inflection  was  significant,  F(1,48)  »  4.99,  MSe  -  9249, 
£  <  .05 — i6-inflected  primes  were  associated  with  faster  lexical  decisions  (691 
ms)  than  A-inflected  primes  (711  ms).  The  difference  between  typical  and 
atypical  nouns  was  significant,  F(1,48)  =  83-39,  MSe  -  2768,  £  <  .001;  the 
atypical  nouns  were  responded  to  more  slowly  (723  ms)  than  the  typical  nouns 
(680  ms)  probably  because  of  their  lower  frequency  of  occurrence.  The 
Interaction  of  prime  lexicality  and  prime  inflection  was  significant, 
F(1,48)  =  4.28,  MSe  =  9822,  £  <  .05  as  was  the  interaction  of  prime  lexicality 
and  target  inflection,  F(1,48)  =■  5.97,  MSe  »  2145,  £  <  .01.  There  was  no 
two-way  interaction  between  inflection  of  the  prime  and  the  typicality  of  the 
inflection  of  the  noun.  Lexical  decision  times  for  typical  masculine  nouns 
preceded  by  the  congruent  b- inflected  primes  (real  and  pseudo)  were  33  ms 
shorter,  on  the  average,  than  lexical  decision  times  for  typical  masculine 
nouns  preceded  by  incongruent  A-inflected  primes  (real  and  pseudo).  This 
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average  difference  for  atypical  masculine  nouns  was  15  ms.  There  was,  howev¬ 
er,  a  significant  three-way  interaction  among  prime  lexicality,  prime  inflec¬ 
tion,  and  target  inflection  (typical  vs.  atypical),  F(1,15)  =  5.06, 
MSe  =  3193.  £  <  .05.  Inspection  of  Tables  2  and  3  reveals  that  the  inflection 
of  the  pseudoadjective  prime  did  not  matter  for  either  typical  or  atypical 
nouns.  The  congruency- incongruency  difference  was  -4  ms  and  -7  ms, 
respectively.  In  contrast,  the  inflection  of  the  adjective  prime  did  matter 
for  both  typical  nouns  and  atypical  nouns  and  it  mattered  more  for  the  typical 
nouns  than  the  atypical  nouns.  The  congruency- incongruency  difference  was  60 
ms  and  22  ms,  respectively.  In  sum,  the  data  suggest  that  the  magnitude  of 
the  gender  congruency/ incongruency  effect  differed  between  typical  and  atypi¬ 
cal  masculine  nouns. 

The  fourth  question  addressed  parallels  the  third.  Does  the  gender  con¬ 
gruency/incongruency  effect  differ  between  typical  and  atypical  feminine 
nouns?  The  answer  in  this  case  is  negative.  A  group  x  prime  lexicality  x 
prime  inflection  x  target  inflection  (typical  vs.  atypical)  revealed  only  one 
significant  effect,  namely,  the  main  effect  of  prime  inflection, 
F(1,'<8)  =  17.30,  MSe  =  6675,  p  <  .001;  A-inflected  primes  were  associated  with 
faster  lexical  decision  (648  ms)  than  ^-inflected  primes  (678  ms)  as  ought  to 
be  the  case  for  feminine  noun  targets. 

Finally,  with  respect  to  the  pseudonoun  data,  separate  analyses  of  vari¬ 
ance  revealed  that  for  both  the  typical  and  atypical  cases  there  was  a  signif¬ 
icant  effect  of  target  inflection  ((6  vs.  A):  F(1,51)  “  6.54,  MSe  »  3050, 
£  <  .01  and  F(1,51)  -  4.77,  MSe  =  4290,  p  <  .05,”  respect! vely.  Pseudonouns 
ending  in  A  were  rejected  more  slowly.  A~further  significant  effect  was  ob¬ 
served  in  the  atypical  analysis,  namely,  the  interaction  of  prime  lexicality 
and  target  inflection,  F(1,51)  -  18.90,  MSe  -  2827,  £<  .001.  Where 
<)-inflected  atypical  pseudonouns  were  responded  to  faster  when  preceded  by  a 
pseudo-possessive  adjective,  A-inflected  atypical  pseudonouns  were  responded 
to  faster  when  preceded  by  a  possessive  adjective.  The  data  equivocate  on 
whether  or  not  rejecting  pseudonouns  was  made  more  difficult  by  a  grammatical¬ 
ly  and  lexically  proper  context. 

Discussion 

In  the  present  experiments,  possessive  adjectives  provide  a  minimal 
grammatical  context  for  nouns  in  the  nominative  singular.  With  case  and  num¬ 
ber  held  constant  it  is  shown  that  when  the  two  words  agree  in  gender,  lexical 
decision  on  the  target  noun  is  faster  than  when  the  two  words  disagree  in 
gender.  A  previous  experiment  (Gurjanov  et  al.,  1985)  found  no  effect  of  case 
congruency  on  the  processing  of  nouns  in  the  nominative  singular.  That  gender 
congruency  does  affect  the  processing  of  nominative  singulars  may  have  impli¬ 
cations  for  the  representation  of  inflected  nouns  in  the  internal  lexicon 
(Lukatela  et  al . ,  1980). 

The  lesson  learned  from  Experiment  2  is  that  the  gender  congruency/ incon¬ 
gruency  effect  is  not  mediated  by  visual  identity  or  phonemic  identity  of  the 
morphemes  that  inflect  the  possessive  adjective  and  the  noun.  This  latter 
observation  implies  that  the  gender  congruency/ Incongruency  effect  must  in¬ 
volve  the  recognition  of  the  genders  of  the  possessive  adjective  and  the  noun, 
which  implies,  in  turn,  that  gender  is  part  of  a  word's  representation  in  the 
lexicon.  It  is  not  presumptuous  to  assume  that  one's  knowledge  of  words  in¬ 
cludes  a  knowledge  of  the  grammatical  arrangements  into  which  they  may  enter. 
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To  know  that  the  feminine  possessive  adjective  MOJA  cannot  be  entered  into  a 
grammatical  arrangement  with  the  masculine  nouns  LONAC  or  DEDA  is  to  know  that 
MOJA  and  LONAC  or  MOJA  and  DEDA  are  of  unlike  gender.  On  the  other  hand,  to 
know  that  the  masculine  possessive  adjective  MOJ  can  be  linked  to  the  mascu¬ 
line  nouns  LONAC  and  DEDA  is  to  know  that  these  words  are  alike  in  case,  num¬ 
ber,  and  gender. 

The  argument  that  there  is  a  syntactical/grammatical  processor  is  an 
argument  for  a  device  separate  from  the  device  that  accesses  lexical  represen¬ 
tations  and  separate  from  the  device  that  assigns  meaning  to  an  arrangement  of 
words  (cf.  Forster,  1979).  The  syntactic/ grammatical  processor  assigns  a 
syntactical  structure  or  a  grammatical  relation  to  a  context-target  arrange¬ 
ment.  It  obviously  has  a  degree  of  autonomy;  there  are  many  celebrated  exam¬ 
ples  of  English  syntactical  structure  being  assignable  to  a  list  of  nonsense 
letter  strings.  However,  with  respect  to  the  question  of  the  information  with 
which  the  syntactic  or  grammatical  process  works,  it  must  be  supposed  that 
that  information  is  derived  in  large  part  by  the  lexical  processor.  Seiden- 
berg  et  al.  (1982)  showed  that  in  English  lexical  priming  contexts,  facilita¬ 
tion  effects  are  not  indifferent  to  the  grammatical  function  of  words  and 
argue  for  a  model  of  the  internal  lexicon  enriched  by  syntactical  details — an 
argument  consonant  with  the  suggestions  of  Kaplan  and  Bresnan  (1982)  and 
Gazdar  (1982)  in  theoretical  linguistics  and  continuous  with  the  experimental 
efforts  of  Huttenlocher  and  Lui  (1979)  and  Miller  and  Johnson-Laird  (1976)  and 
others  to  distinguish  the  mental  representations  of  different  word  classes. 

Given  the  notions  of  lexical  processor,  grammatical  processor  and  message 
processor  (Forster,  1979)  as  three  relatively  Independent  systems  underlying 
lexical  decision,  an  account  of  the  gender  congruency/ incongruency  effect 
takes  the  following  form  (after  West  &  Stanovich,  1982).  When  a  grammatically 
congruent  pair  (e.g.,  MOJ  LANAC,  MOJ  DEDA,  MOJA  PTICA,  or  MOJA  MATER)  is 
presented,  the  outputs  from  the  lexical  processor,  grammatical  processor  and 
message  processor  are  all  positive— the  ideal  situation  for  a  subsequent  deci¬ 
sion-making  mechanism  that  must  arrive  at  the  appropriate  response  "yes."  How¬ 
ever,  when  a  grammatically  incongruent  pair  (e.g.,  MOJA  LONAC,  MOJA  DEDA,  MOJ 
PTICA,  or  MOJ  MATER)  is  presented,  the  output  from  the  lexical  processor  is 
positive  and  so,  perhaps,  is  the  output  from  the  message  processor,  but  the 
output  from  the  grammatical  processor  is  negative.  The  information  made 
available  to  the  grammatical  processor  from  the  lexical  processor  is  that  the 
context  is  one  gender  and  the  target  is  another  gender.  Consequently,  the 
situation  for  the  decision-making  system  is  less  than  ideal;  there  are 
discrepancies  in  the  outputs  and  the  rm  bias  from  the  grammatical  processor 
must  be  overcome  (West  &  Stanovich,  1982).  As  a  result,  lexical  decision  to  a 
grammatically  Incongruent  pair  (e.g.,  MOJA  LONAC)  is  slower  than  lexical  deci¬ 
sion  to  a  grammatically  congruent  pair  (e.g.,  MOJ  LONAC). 

The  foregoing  account  is  sufficiently  general  to  accommodate  the 
syntactic  or  grammatical  priming  effects  found  with  English  language  materials 
(Goodman  et  al . ,  1981;  Wright  &  Garrett,  1984)  and  those  found  with  Ser¬ 
bo-Croatian  language  materials.  Where  the  account  is  weak  is  in  its  failure 
to  distinguish  those  components  of  grammatical  processing  that  are  automatic 
or  reflexive  (Fodor,  1983;  Wright  i  Garrett,  1984)  from  those  that  are  merely 
strategic,  that  is,  those  that  are  "conscious-attentive"  and  shaped  by  the 
conditions  of  the  experiment.  This  failure  is  due  in  part  to  the  lack  of  data 
relevant  to  the  contrast.  It  has  been  established  empirically  that 
associative  priming  involves  components  of  both  kinds  and  the  theory  of 
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associative  priming  ably  recognizes  the  distinction  (Neely,  1977).  If  syntac¬ 
tic  or  grammatical  priming  proves  to  depend  similarly  on  a  fast-acting 
automatic  process  and  a  slow-acting  conscious-attentive  process,  then  this 
much  seems  certain:  In  syntactic  or  grammatical  priming  both  of  these  proces¬ 
ses  are  post-lexical  (Gurjanov  et  al.,  1985;  Seidenberg  et  al.,  1984;  West  & 
Stanovich,  1982). 
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Abstract.  Two  experiments  are  reported  in  which  subjects  made  rapid 
lexical  decisions  about  inflected  nouns  preceded  by  inflected 
adjectives  or  pseudoadjectives  that  did  or  did  not  agree  grammati¬ 
cally.  Both  adjectives  and  pseudoadjectives  were  shown  to  affect 
lexical  decision  times  for  nouns,  suggesting  that  the  priming  of 
inflected  nouns  by  inflected  adjectives  occurred  at  the  level  of  the 
inflections.  Inflected  pseudonouns,  however,  were  not  affected  sim¬ 
ilarly,  suggesting  that  lexical  factors  were  contributing  to  the 
priming  in  addition  to  grammatical  factors.  This  instance  of 
grammatical  priming  is  described  as  an  effect  that  arises  post-lexi- 
cally,  based  on  the  outcomes  of  relatively  independent  lexical  and 
syntactical  processors. 

Two  broad  questions  may  be  raised  with  regard  to  the  processing  of  nouns 
in  an  Inflected  language;  (1)  How  are  the  cases  of  a  noun  organized  with  re¬ 
gard  to  each  other  in  the  internal  lexicon?;  and  (2)  How  are  inflected  nouns 
linked  to  other  lexical  types  such  as  prepositions  and  inflected  adjectives? 
Serbo-Croatian  is  an  Inflected  language  in  which  the  noun  takes  a  gender 
(masculine,  feminine,  or  neuter)  and  is  declined  in  seven  forms  (nominative, 
accusative.  Instrumental,  genitive,  dative,  locative,  vocative),  both  in  the 
singular  and  the  plural.  The  fourteen  inflected  forms  of  a  Serbo-Croatian 
noun  can  be  viewed  as  forming  a  noun  system  (Lukatela,  Gligori jevid,  Kostl<5,  & 
Turvey,  1980).  Ordinarily  an  inflected  Serbo-Croatian  noun  in  a  sentence  is 
grammatically  related  to  a  preposition  and  to  one  or  more  adjectives.  Al¬ 
though  they  are  not  declined,  prepositions  are  specific  to  inflected  noun  end¬ 
ings.  A  given  preposition  goes  with  at  least  one  noun  case,  sometimes  several 
cases  but  never  with  all  noun  cases.  Adjectives  are  declined  but  not 
necessarily  with  the  same  Inflected  endings  as  nouns.  When  qualifying  a  noun, 
however,  the  Inflection  of  the  adjective  and  the  inflection  of  the  noun  must 
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agree  grammatically  (for  example,  if  the  noun  is  masculine  and  in  the  singular 
accusative  form,  the  adjective  must  be  masculine  and  in  the  singular 
accusative  form). 

With  respect  to  the  first  question  raised  above  on  the  organization  of 
the  cases  there  is  evidence  to  suggest  that  frequency — precisely,  the  frequen¬ 
cies  with  which  the  various  inflected  noun  forms  occur  in  ordinary  language 
usage — is  not  a  major  determinant  of  a  Serbo-Croatian  noun  system's  organiza¬ 
tion,  In  a  lexical  decision  task  nouns  in  the  nominative  singular  form  were 
accepted  as  words  faster  than  nouns  in  the  oblique  forms.  Among  the  oblique 
forms,  however,  decision  times  did  not  differ  despite  marked  differences  among 
the  oblique  forms  in  their  respective  frequencies  of  occurrence  (Lukatela  et 
al.,  1978;  Lukatela  et  al.,  1980).  Apparently,  the  nominative  and  oblique 
forms  are  qualitatively  distinguished  in  the  organization  of  a  noun  system 
with  the  nominative  assuming  a  pivotal  role.  However,  in  either  an  oblique  or 
nominative  form  a  noun  appears  to  be  represented  in  the  lexicon  as  a  single 
unit  corresponding  to  the  complete  word  rather  than  as  a  combination  of  dis¬ 
tinct  units  corresponding  to  morphemic  constituents.  The  stems  and  suffixes 
of  Serbo-Croatian  nouns  do  not  appear  to  be  stored  separately.  An  observation 
of  the  unitary  representation  of  nouns,  however,  does  not  rule  out  the  possi¬ 
bility  that  noun  representations  indicate  their  stem/suffix  structure 
(Stanners,  Neiser,  &  Palnton,  1979:  Taft  &  Forster,  1975). 

With  respect  to  the  second  question  raised  above  (on  the  processing  rela¬ 
tion  of  nouns  to  other  lexical  types),  it  has  been  shown  that  with  Serbo-Croa¬ 
tian  words  a  preposition  preceding  a  noun  case  with  which  it  is  grammatically 
consistent  speeds  up  lexical  decision  on  the  noun.  However,  lexical  decision 
on  an  Inflected  noun  form  that  is  grammatically  inconsistent  with  the  preced¬ 
ing  preposition  is  not  appreciably  slowed  (Lukatela,  Kostid,  Feldman,  &  Tur- 
vey,  1983).  Facilitation  (and  inhibition)  effects  among  words  are  often  ex¬ 
plained  (but  not  always,  see  Discussion)  by  a  notion  of  activation  spreading 
out  from  one  excited  region  of  the  lexicon  to  neighboring  regions  and/or  by  a 
notion  of  a  directing  of  attention  to  a  specified  region  of  the  lexicon.  The 
first  of  these  mechanisms  may  be  suited  to  semantic  relations  among  lexical 
entries  but  it  is  not  easily  generalized  to  grammatical  relations  such  as  be¬ 
tween  members  of  a  closed  class  like  prepositions  and  an  open  class  like  nouns 
(and  it  is  not  easily  generalized  to  semantic  relations  in  natural  discourse, 
as  Foss  [1982]  has  noted).  The  notion  of  an  automatic  spread  of  activation 
refers  to  a  specific  linkage  between  particular  representations  of  particular 
words  (see  Collins  &  Loftus,  1975) — (direct)  stimulation  of  one  lexical 
representation  leads  mechanically  and  Inevitably  to  the  (indirect)  stimulation 
of  other  lexical  representations.  The  relation  of  prepositions  to  nouns,  how¬ 
ever,  is  not  sensibly  portrayed  as  linkages  among  particular  internal 
representations  of  complete  words.  (What  would  rationalize  the  linkage  of 
above  and  elephant? )  If  there  are  linkages  one  might  expect  them  to  be  de¬ 
fined  over  the  small  set  of  prepositions  and  the  small  set  of  morphemes  that 
comprise  the  Inflected  endings  of  nouns.  By  such  an  account,  prepositions 
would  not  be  linked  to  the  very  many  noun  systems  but  to  the  few  sets  of 
inflected  endings  that  the  very  many  noun  systems  share.  The  problem  with 
this  account  is  that  the  inflected  endings  of  (Serbo-Croatian)  nouns  do  not 
appear  to  be  stored  as  sets  separately  from  their  stems. 

The  present  experiments  extend  the  inquiry  into  Serbo-Croatian  nouns  and 
their  processing  relation  to  other  word  types.  Here  the  focus  is  the  relation 
of  nouns  to  adjectives.  Two  related  questions  are  raised.  First,  can 
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adjectives  affect  the  time  to  lexically  evaluate  nouns  with  which  they  are 
grammatically  consistent?  And  second,  if  adjectives  can  affect  lexical  deci¬ 
sions  on  nouns  do  they  do  so  at  the  morphological  level,  that  is,  the  level  of 
stems  and  affixes  (rather  than,  say,  the  whole  word  level)?  Support  for  the 
view  that  adjectival  influences  on  nouns  can  be  mediated  by  processes  at  the 
level  of  inflected  endings  would  be  provided  by  the  demonstration  that  both 
adjective  contexts  and  pseudoadjective  contexts  (letter  strings  derived  from 
adjectives  by  changing  the  initial  or  middle  consonant)  expedite  lexical  deci¬ 
sions  on  noun  targets  when  the  inflection  of  the  contextual  item  and  the  tar¬ 
get  are  in  grammatical  agreement. 

The  selection  of  nouns  used  in  the  experiments  was  guided  by  the  follow¬ 
ing  considerations.  With  a  few  exceptions  Serbo-Croatian  nouns  fall  into 
three  declensional  classes  according  to  the  inflected  ending  of  the  genitive 
singular  case.  These  three  classes  are  designated  (after  Bidwell,  1970)  as 
Class  A  (where  the  genitive  singular  ending  in  /e/,  for  example,  ZENE),  Class 
0  (where  the  genitive  singular  ending  is  /a/,  for  example,  COVEKA),  and  Class 
C  (where  the  genitive  singular  ending  is  /!/,  for  example,  STVARI).  The 
dominating  gender  for  Class  A  nouns  is  feminine.  The  nouns  in  Class  C  are  al¬ 
most  exclusively  feminine  but  Class  C  occurs  less  frequently  than  Class  A. 
Class  0  nouns  are  mostly  masculine  and  neuter  nouns.  From  a  consideration  of 
nouns  in  the  ordinary,  written  language,  Kosti6  (1965)  reported  that  the 
masculine  gender  accounts  for  52  percent,  the  feminine  gender  for  36  percent 
and  the  neuter  gender  for  12  percent.  Consequently,  the  nouns  in  the  corpus 
of  words  from  which  the  stimuli  of  the  present  experiment  were  drawn  occurred 
in  the  three  genders  in  approximately  the  proportions  identified  by  Kosti6, 
with  the  masculine  and  neuter  nouns  drawn  from  the  declension  Class  0  and  the 
feminine  nouns  drawn  from  the  declension  Class  A. 

The  adjectives  in  the  corpus  of  words  from  which  the  stimuli  were  drawn 
were  common  adjectives  all  declined  as  indefinite  adjectives.  Common 
adjectives  are  those  that  can  be  declined  both  definitely  and  indefinitely. 
The  indefinite  declension  of  an  adjective  applies  when  the  function  is  either 
predication  or  attribution.  In  the  latter  role  the  indefinite  adjective  is 
not  accompanied  by  a  deictic  such  as  "this,"  "that,"  etc.,  and  is  referential- 
ly  vague.  Definite  adjectives  are  restricted  to  the  attributive  function  and 
are  always  conjuncted  with  a  deictic.  When  an  adjective  qualifies  an  inani¬ 
mate  noun  in  the  masculine  gender,  the  indefinite  and  definite  declensions  are 
distinguished  by  the  inflected  endings  of  the  nominative  singular  and 
accusative  singular.  There  are,  however,  no  such  written  distinctions  for  the 
definite  and  indefinite  adjectival  declensions  when  the  word  being  qualified 
is  an  inanimate  noun  in  the  feminine  gender  (although  such  distinctions  can  be 
found  in  the  spoken  language  in  the  form  of  stress  variations).  The  choice  of 
the  referentially  less  precise  indefinite  declension  was  motivated,  in  part, 
by  the  desire  to  keep  to  a  minimum  the  semantic  relation  between  the  adjecti¬ 
val  and  nominal  forms  paired  in  the  experiments. 

Experiment  1 

The  first  experiment  was  directed  at  the  effect  of  grammatical  consisten¬ 
cy  between  adjectives  (real  and  pseudo)  and  nouns  in  the  nominative  singular 
and  genitive  singular  cases.  These  two  cases  are  the  most  frequently  occur¬ 
ring  noun  cases — the  nominative  singular  accounting  for  approximately  25  per¬ 
cent,  and  the  genitive  singular  accounting  for  approximately  20  percent,  of 
all  instances  of  the  noun  (Kostifi,  1965;  Lukatela  et  al.,  1980).  The  inflec- 
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tions  of  these  two  cases  for  adjectives  and  nouns  of  all  three  genders  are 
shown  in  Table  1  .  Only  for  the  feminine  gender  are  the  adjectival  and  nominal 
Inflections  identical. 


Table  1 

Nominative  Singular  and  Genitive  Singular  Inflections  of  Serbo-Croatian 
Adjectives  and  Nouns  as  a  Function  of  Gender 

MASCULINE  FEMININE  NEUTER 


ADJECTIVE 

NOUN 

ADJECTIVE 

NOUN 

ADJECTIVE 

NOUN 

NOMINATIVE 

6 

6 

A 

A 

6 

E  or 

SINGULAR 

GENITIVE 

OG 

A 

E 

E 

OG 

A 

SINGULAR 

6  -  null  morpheme 


There  is  some  reason  to  believe  that  the  effect  of  a  preceding  grammati¬ 
cally  consistent  adjective  on  lexical  decision  will  not  be  of  the  same  magni¬ 
tude  for  nouns  in  the  nominative  singular  case  and  nouns  in  the  genitive  sin¬ 
gular  case.  As  noted  above,  the  nominative  singular  of  a  noun  is  qualitative¬ 
ly  distinguished  from  the  oblique  cases  of  a  noun  and  appears  to  play  a  pivo¬ 
tal  role  in  the  organization  of  a  noun's  case  system  (Lukatela  et  al.,  1980). 
Moreover,  the  nominative  singular  is  less  dependent  on  grammatical  factors  for 
its  interpretation  than  are  the  oblique  cases  (see  Lukatela  et  al.,  1983).  It 
was  expected,  therefore,  that  for  nouns  in  the  genitive  singular  lexical  deci¬ 
sion  would  be  fastest  when  the  prime  was  grammatically  consistent  but  for 
nouns  in  the  nominative  singular  lexical  decision  times  would  be  less  partial 
to  the  grammatical  consistency  of  prime  and  target. 

Method 


Subjects.  Fifty-six  undergraduate  students  from  the  Department  of 
Psychology  at  the  University  of  Belgrade  participated  in  the  experiment.  All 
subjects  had  previously  participated  in  reaction  time  experiments. 

Materials.  A  list  of  150  adJectlve-noun  pairs  was  constructed  with  all 
adjectives  and  nouns  (1)  drawn  from  the  mid-frequency  range  of  the  Kostid 
table,  (2)  in  the  nominative  singular  form  and  (3)  comprising  pairs  that  were 
congruent  in  gender.  This  list  was  presented  to  70  students  (from  the  Depart¬ 
ment  of  Linguistics)  who  Judged  the  associative  strength  of  each  pair — that 
la,  the  degree  to  which  the  adjective  and  the  noun  in  a  pair  were  related. 
The  twenty-eight  adjective-noun  pairs  that  were  Judged  to  be  most  weakly 
associated  were  used  to  generate  four  groups  of  28  word-word  pairs: 
nominative  singular-nominative  singular  pairs,  nominative  singular-genitive 
singular  pairs,  genitive  singular-nominative  singular  pairs  and  genitive  sin¬ 
gular-genitive  singular  pairs.  (In  each  of  the  foregoing  pair  types,  the 
first  case  is  that  of  the  adjective  and  the  second  case  is  that  of  the  noun. ) 
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A  different  set  o'"  adjective-noun  pairs,  drawn  from  the  original  list  of 
150  pairs,  was  used  to  generate  four  corresponding  groups  of  28 
pseudoadjective-pseudonoun  pairs  (by  changing  either  the  initial  or  middle 
letter  of  both  the  adjective  and  the  noun).  Another  set  of  28  pairs  from  the 
original  list  of  150  pairs  was  used  to  generate  four  corresponding  groups  of 
28  pseudoadjective-noun  pairs  (by  changing  either  the  initial  or  middle  letter 
of  the  adjective).  Finally,  one  further,  different  set  of  28  pairs  was  trans¬ 
formed  into  four  corresponding  groups  of  28  adjective-pseudonoun  pairs  (by 
changing  either  the  initial  or  middle  letter  of  the  noun).  Throughout  the 
generation  of  these  different  groups — that  paired  pseudowords  or  paired  a 
pseudoword  with  a  word — the  pseudoword  version  of  a  noun  or  adjective  in 
nominative  singular  or  genitive  singular  preserved  the  case  ending. 

The  adjectives  and  pseudoadjectives  were  presented  as  Roman  letter 
strings  (IBM  Gothic)  arranged  horizontally  in  the  upper  half  of  35  mm  slides. 
In  contrast,  nouns  and  pseudonouns  were  arranged  horizontally  in  the  lower 
half  of  35  mm  slides.  The  "adjective"  slides  and  the  "noun"  slides  were 
grouped  into  pairs  as  determined  above  to  yield  a  total  of  4^18  pairs  of  slides 
(28xilxA)  of  which  a  given  subject  saw  112  pairs. 

Design.  The  major  constraint  on  the  design  of  the  experiment  was  that  a 
given  subject  never  encountered  a  given  word  or  pseudoword  in  any  of  the  pairs 
more  than  once.  This  was  achieved  by  dividing  subjects  into  four  groups  with 
14  subjects  in  each  group  and  by  dividing  each  set  of  28  pairs  into  four 
subgroups  of  7  pairs.  In  sum,  a  subject  saw  7  pairs  of  stimuli  from  each  of 
the  16  groups  of  pairs.  Put  differently,  each  subject  saw  the  same  adjectives 
and  nouns  as  every  other  subject  but  not  necessarily  in  the  same  grammatical 
case  nor  necessarily  in  the  same  type  of  nominative-genitive  permutation. 

Procedure.  On  each  trial,  two  slides  were  presented.  The  subject's  task 
was  to  decide  as  rapidly  as  possible  whether  the  letter  string  contained  in  a 
slide  was  a  word.  Each  slide  was  exposed  in  one  channel  of  a  three-channel 
tachlstoscope  (Scientific  Prototype,  Model  CB)  illuminated  at  10.3  cd/in2. 
Both  hands  were  used  in  responding  to  the  stimuli.  Both  thumbs  were  placed  on 
a  telegraph  key  button  close  to  the  subject  and  both  forefingers  on  another 
telegraph  key  button  two  Inches  further  away.  The  closer  button  was  depressed 
for  a  "No"  response  (the  string  of  letters  was  not  a  word);  and  the  further 
button  was  depressed  for  a  "Yes"  response  (the  string  of  letters  was  a  word). 

Latency  was  measured  from  the  onset  of  a  slide.  The  subject's  response 
to  the  first  slide  terminated  its  duration  and  initiated  the  second  slide  un¬ 
less  the  latency  exceeded  1300  ms,  in  which  case  the  second  slide  was  initiat¬ 
ed  automatically.  The  duration  of  the  second  slide,  unlike  that  of  the  first, 
was  fixed  at  1 300  ms. 

Results  and  Discussion 

A  mean  reaction  time  was  computed  for  each  subject  by  averaging  over  the 
seven  nouns  or  seven  pseudonouns  in  each  group  of  prime-target  pairs.  Reac¬ 
tion  times  less  than  300  ms  and  longer  than  1 300  ms  were  exluded  as  were  the 
times  associated  with  erroneous  responses.  The  total  number  of  responses 
excluded  by  the  preceding  criteria  did  not  exceed  1.5  percent. 
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Table  2 

Lexical  Decision  and  Percentage  Error  for  Pseudonouns  in  Experiment  1  as  a 
Function  of  Type  and  Grararaatlcal  Case  of  Adjectival  Prime 


Type  of  prime 

Grammatical  case 

Grammatical 

case 

of  prime 

of  target  pseudonoun 

NOMINATIVE 

GENITIVE 

NOMINATIVE 

822® 

845 

3.3*^ 

5.1 

ADJECTIVE 

GENITIVE 

83^ 

833 

3.3 

2.8 

NOMINATIVE 

821 

822 

3.1 

3.3 

PSEUDOADJECTIVE 

GENITIVE 

824 

833 

3.1 

2.6 

^reaction  time  (ms) 
b 

error 


Table  3 


Lexical  Decision 

Latencies  and  Percentage  Error 

for  Nouns  in  Experiment 

Function  of  Type 

and  Grammatical  Case  of  Adjectival  Prime 

Type  of  prime 

Grammatical  case 

Grammatical  case 

of  prime 

of  target 

noun 

NOMINATIVE 

GENITIVE 

NOMINATIVE 

727® 

781 

4.3^ 

4.8 

ADJECTIVE 

GENITIVE 

720 

744 

4.6 

4.8 

NOMINATIVE 

713 

795 

7.9 

6.6 

PSEUDOADJECTIVE 

GENITIVE 

694 

773 

4.3 

2.8 

freactlon  time  (ms) 

D 

error 
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Table  2  reports  the  pseudonoun  data.  As  can  be  seen,  there  were  no 
differences  due  to  the  type  of  prime,  the  grammatical  case  of  the  prime,  or 
the  grammatical  case — inflected  ending — of  the  pseudonoun.  The  mean  reaction 
times  to  the  primes  themselves  were  706  ms  and  726  ms,  respectively,  for 
adjectives  in  the  nominative  singular  and  genitive  singular  forms,  and  841  ms 
and  870  ms,  respectively,  for  pseudoadjectives  inflected  in  the  fashion  of  the 
nominative  singular  and  genitive  singular.  Table  3  reports  the  noun  data. 
The  only  effects  that  were  significant  according  to  the  analysis  of  variance 
on  both  subject  and  item  means  were:  grammatical  case  of  the  adjectival  prime 
(F(l,52)  =  24.31,  MSe  =  2082,  £  <  .001  and  F(1,27)  =  4.46,  MSe  =  5676,  2  < 
.05)  and  grammatical  case  of  the  noun  target  (F(1  ,52)  =  145.26  ,  MSe  =  2733,  2 
<  .001  and  F(1 ,27)  =  26.36,  MSe  =  7532,  £  <  .001). 

The  failure  to  observe  a  significant  priming  effect  by  either  adjectives 
or  pseudoadjectives  might  have  been  expected.  Approximately  half  of  the  words 
used  in  the  experiment  were  feminine.  The  genitive  singular  form  of  feminine 
nouns  (and  adjectives)  are  Identical  to  the  nominative  plural  form  of  feminine 
nouns  (and  adjectives)  (see  Table  1).  As  noted,  the  nominative  singular  case 
of  nouns  has  not  proven  to  be  sensitive  to  priming.  If  the  nominative  plural 
is  similarly  indifferent  to  priming  and  if  the  feminine  "genitive  singular" 
noun  forms  of  the  present  experiment  were  interpreted  as  nominative  plural 
forma,  then  the  adjectival  and  pseudoadjectival  priming  of  nouns  would  be 
thwarted.  Table  4  distinguishes  the  mean  decision  times  for  the  mascu¬ 
line/neuter  items  from  those  for  the  feminine  items.  Inspection  of  Table  4 
suggests  that  (1 )  adjectival  and  pseudoadjectival  effects  were  present  for  the 
masculine/neuter  genitive  singular  forms  (corroborated  by  a  subject  analysis, 
F(1 ,52)  »  6.22,  MSe  «  21961,  £  <  .02,  but  not  by  an  item  analysis)  and  absent 
for  the  feminine  genitive  singular  forms  (the  prime  case  by  target  case 
interaction  was  not  significant  by  either  subjects  or  items  analysis);  and  (2) 


Table  4 

Lexical  Decision  Latencies  of  Experiment  1  as  a  Function  of  Noun  Gender 


gender  and 

case  of  target  noun 

Type  of  prime 

grammatical  case 
of  prime 

masculine/neuter 
NOMINATIVE  GENITIVE 

feminine 

NOMINATIVE 

GENITIVE 

NOMINATIVE 

721 

810 

739 

719 

ADJECTIVE 

GENITIVE 

696 

746 

731 

727 

NOMINATIVE 

708 

833 

719 

731 

PSEUDOADJECTIVE 

GENITIVE 

697 

803 

704 

728 

the  commonly  obtained  (e.g. ,  Lukatela  et  al.,  1978;  Lukatela  et  al.,  1980; 
Lukatela  et  al.,  1983)  faster  decision  times  for  nominative  singular  forms 
relative  to  oblique  forms  was  not  found  with  the  feminine  noun  data,  implying 
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that  the  feminine  nouns  in  the  "genitive  singular"  were  not  being  interpreted 
as  such.  It  should  be  noted  that  a  similar  but  less  pronounced  confounding  of 
cases  is  also  true  for  the  neuter  genitive  singular  (which  is  written  identi¬ 
cally  to  the  nominative  plural  and  genitive  plural).  However,  whereas  for  the 
feminine  gender  both  nouns  and  adjectives  assume  identical  forms  in  the 
genitive  singular  and  nominative  plural,  for  the  neuter  gender  Identity  of 
forms  holds  only  for  nouns. 

Experiment  2 

The  second  experiment  used  the  same  design,  the  same  procedure,  and  the 
same  adjective-nouns  pairs  as  those  of  the  first  experiment  but  replaced  the 
genitive  singular  case  by  the  dative-locative  singular  case  and  with  a  new 
group  of  56  subjects  from  the  same  subject  pool.  In  the  declension  of 
adjectives  and  nouns  the  dative  singular  and  the  locative  singular  are  identi¬ 
cal  in  each  of  the  three  genders.  The  characteristic  inflections  common  to 
dative  singular  and  locative  singular  are  shown  in  Table  5.  With  respect  to 
the  noun  case  confoundings  identified  above,  the  dative  singular-locative  sin¬ 
gular  inflection  across  the  three  genders  Is  not  shared  with  the  nominative 
plural  and,  in  fact,  is  shared  with  no  other  case.  Thus,  in  comparison  to 
Experiment  1  ,  grammatical  priming  between  feminine  gender  words  should  be  ob¬ 
served  in  Experiment  2  if,  indeed,  the  failure  to  obtain  such  priming  in 
Experiment  1  was  due  to  case  confounding. 


Table  5 

Inflections  of  Dative  Singular  and  Locative  Singular  Adjectives  and  Nouns  as  a 


Function  of 

Gender 

Masculine 

Feminine 

Neuter 

ADJECTIVE 

OM 

OJ 

OM 

NOUN 

U 

I 

U 

Results  and  Discussion 

The  mean  lexical  decison  latencies  were  computed  in  the  manner  described 
in  Experiment  1.  The  positive  and  negative  responses  to  the  adjectival  primes 
were  similar  in  pattern  to  those  reported  for  Experiment  1 .  Negative  re¬ 
sponses  to  the  pseudonoun  targets  are  given  in  Table  6.  No  main  effects  or 
interactions  were  significant.  Table  7  reports  the  noun  data  for  all  three 
genders  taken  together.  The  analysis  of  variance  on  subject  means  and  item 
means  (reported  in  parentheses)  revealed  significant  effects  for  the  grammati¬ 
cal  case  of  the  adjective  prime,  F(1 ,52)  =  40.59,  MSe  »  1372,  £  <  .001 
(F(1,27)  -  8.05,  MSe  =  3460,  £  <  .01);  for  the  grammatical  case  of  the  noun 
target,  F(1 ,52)  =  61.27,  MSe  -  2508,  £  <  .001  (F(1,27)  -  22.98,  MSe  -  3343,  £ 
<  .001),  for  the  type  of  adjectival  prime,  F(1 ,52)  =  7.88,  MSe  -  4104,  £  <  .01 
(F(1,27)  -  8.85,  MSe  «  1827,  £  <  .01),  and  for  the  Interaction  between  the 
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Table  6 


Lexical  Decision  Latencies  and  Percentage  Error  for  Pseudonouns  in  Experiment 
2  as  a  Function  of  Type  and  Grammatical  Case  of  Adjectival  Prime 


Type  of  prime 

Grammatical  case 

Grammatical  case 

of  prime 

of  target 

noun 

NOMINATIVE 

DATIVE/LOCATIVE 

NOMINATIVE 

758^^ 

758 

5.6 

ADJECTIVE 

DATIVE/LOCATIVE 

757 

774 

3.6 

2.0 

NOMINATIVE 

763 

767 

1.8 

4.8 

PSEUDOADJECTIVE 

DATIVE/LOCATIVE 

750 

760 

3.3 

4.8 

^reaction  time  (ms) 
b 

error 


Table  7 


Lexical  Decision  Latencies  and  Percentage  Error  for  Nouns  in  Experiment  2  as  a 
Function  of  Type  and  Grammatical  Case  of  Adjectival  Prime 


Type  of  prime 

Grammatical  case 

Grammatical  case 

of  prime 

of  target 

noun 

NOMINATIVE 

DATIVE/LOCATIVE 

NOMINATIVE 

672® 

726 

3.3*^ 

3.1 

ADJECTIVE 

DATIVE/LOCATIVE 

668 

685 

2.6 

4.6 

NOMINATIVE 

656 

708 

3.8 

3.3 

PSEUDOADJECTIVE 

DATIVE/LOCATIVE 

647 

673 

0.8 

3.3 

^reaction  time  (ms) 
b 

error 
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grammatical  case  of  the  adjectival  prime  and  the  grammatical  case  of  the  noun 
target,  F(1 ,52)  =  16.10,  MSe  =  1841,  £  <  .001  (F(1,27)  =  4.84,  MSe  =  3060,  £  < 
.05).  All  other  two-way  and  three-way  interactions  were  nonsignificant.  The 
significance  of  the  type  of  adjectival  prime  may  be  attributed  to  the  differ¬ 
ence  between  responding  positively  to  two  successive  stimuli  (in  the  adjective 
trials)  and  responding  negatively  to  the  first  stimulus  and  positively  to  the 
second  stimulus  (in  the  pseudoadjective  trials).  Intuitively,  this  interpre¬ 
tation  suggests  slower  decision  times  for  targets  following  pseudoadjectives. 
Inspection  of  Table  7  (and  of  Table  3)  shows,  to  the  contrary,  that 
pseudoadjective  primes  were  associated  with  overall  faster  decisions.  One  is 
tempted  to  say  that  the  effect  of  word  primes  is  predominantly  "inhibitory." 

Table  8  reports  the  mean  lexical  decision  times  for  the  nouns  partitioned 
according  to  the  masculine/neuter  gender  and  feminine  gender  categories. 
Inspection  of  Table  8  and  comparisons  with  the  pattern  of  results  in  Table  4 
suggest  that  grammatical  priming  occurred  in  both  categories  in  the  second 
experiment  in  contrast  to  the  first  and  lends  credence  to  the  interpretation 
given  of  the  feminine  gender  data  of  the  first  experiment. 


Table  8 

Lexical  Decision  Latencies  of  Experiment  2  as  a  Function  of  Noun  Gender 


gender  and  case  of  target  noun 


Type  of  prime 

grammatical 

case  masculine/neuter 

feminine 

of  prime 

NOMINATIVE 

DATIVE/ 

LOCATIVE 

NOMINATIVE 

DATIVE/ 

LOCATIVE 

NOMINATIVE 

663 

717 

682 

741 

ADJECTIVE 

DATIVE 

652 

672 

680 

685 

NOMINATIVE 

651 

708 

662 

708 

PSEUDOADJECTIVE 

DATIVE 

643 

669 

649 

682 

Discussion 

The  theoretically  important  descriptors  "facilitation"  and  "inhibition" 
are  not  applicable  to  the  data  of  Experiments  l  and  2.  In  neither  experiment 
is  there  a  neutral  context  to  provide  a  baseline.  The  results  are  more 
prudently  summarized  in  terms  of  an  inequality  and  an  equality: 

(1  )  The  lexical  decision  time  for  a  noun  in  a  grammatically  congruent 

adjective  or  pseudoadjective  context  is  less  than  the  lexical  decision 

time  for  a  noun  in  a  grammatically  incongruent  adjective  or 
pseudoadjective  context;  and 

(2)  The  lexical  declilon  time  for  a  pseudonoun  in  a  grammatically  congruent 
adjective  or  pseudoadjective  context  is  equal  to  the  lexical  decision 

time  for  a  pjeudonoun  in  a  grammatically  Incongruent  adjective  or 

pseudoadjective  context. 
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An  adjective  or  pseudoadjective  defines  a  minimal  grammatical  context 
(cf.  Kroll  &  Schwieckert,  1978)  for  a  target  noun.  In  terms  of  a  distinction 
suggested  by  Seidenberg,  Tannehaus,  Leiman,  and  Bienkowski  (1982),  this  mini¬ 
mal  grammatical  context  is  "nonpriming,"  meaning  that  it  contains  no  lexical 
items  that  are  semantic  relatives  or  associates  of  the  target  item.  By  argu¬ 
ment,  a  nonpriraing  context  cannot  have  a  selective  influence  on  the  lexicon;  a 
selective  influence  is  solely  a  consequence  of  intralexical  processing.  It  is 
suggested  that  Intralexical  processing  reflects  the  Interconnections  of 
entities  in  semantic  memory  but  it  does  not  reflect  grammatical  structure  and 
pragmatic  knowledge  (Forster,  1979).  The  context  that  gives  rise  to  intralex¬ 
ical  processing — one  that  contains  items  assoclatlvely  and/or  semantically 
related  to  the  target — is  termed  "lexical  priming"  by  Seidenberg  et 
al.  (1982).  In  the  introduction  and  elsewhere  (Lukatela,  Moraca,  Stojnov, 
Savi6,  Katz,  &  Turvey,  1982)  it  has  been  argued  that  the  effect  on  lexical 
decision  of  minimal  grammatical  contexts  (e. g. ,  a  preposition  for  a  noun,  a 
pronoun  for  a  verb)  does  not  lend  itself  to  the  notion  of  processing  based  up¬ 
on  interconnections  among  individual  lexical  representations.  Consequently, 
as  Lukatela  et  al.  (1982)  remark  "...semantic  faciliation  and  grammatical 
facilitation  are  probably  beat  understood  not  as  expressions  of  a  single  mech¬ 
anism  but  rather  as  an  expression  of  different  mechanisms  that  stand  in  a  com¬ 
plementary  relation...."  (p.  299) 

The  sentiment  of  the  preceding  quotation  is  given  expression  in  the  lan¬ 
guage-processing  system  proposed  by  Forster  (1979).  Forster’s  system  is  com¬ 
posed  of  three  sub-systems:  (1)  a  lexical  processor  that  accesses  the 
representations  in  the  lexicon  of  the  target  word  and  the  context  words  (or 
word);  (2)  a  syntactic  processor  that  assigns  a  syntactic  structure  to  the 
sentence  constituted  by  the  target  word  and  its  context;  and  (3)  a  message 
processor  that  assigns  meaning  to  the  syntactic  structure.  All  three  subsys¬ 
tems  feed  into  a  mechanism  that,  in  the  context  of  experiments,  functions  sim¬ 
ply  as  a  decision-maker  (e. g. ,  is  it  a  word?).  Differences  in  positive  lexi¬ 
cal  decision  times  for  target  items  associated  with  different  contexts  may 
originate  in  the  decision  making  process,  that  is,  post  lexically  (West  & 
Stanovich,  1982),  Consider  a  grammatically  congruent  adjective-noun  pair  in 
the  present  experiments.  The  output  from  the  lexical  processor  and  the  output 
from  the  syntactic  processor  will  both  be  positive.  Because  of  the  weak 
association  between  the  words  in  the  present  experiments  the  output  from  the 
message  processor  might  be  negative  or  arise  too  slowly  to  contribute  to  the 
decision  making  (cf.  de  Groot,  Thomassen,  4  Hudson,  1982).  In  contrast,  for  a 
grammatically  incongruent  adjective-noun  pair  the  output  from  the  lexical 
processor  will  be  positive  but  the  output  from  the  syntactic  processor  will  be 
negative.  In  order  for  the  decision-making  mechanism  to  arrive  at  an  appro¬ 
priate  response  in  the  situation  of  an  incongruent  adjective-noun  pair  it  must 
overcome  the  bias  toward  a  no  decision  engendered  by  the  syntactic  processor. 
Overcoming  this  bias  will  take  time  and  consequently  the  lexical  decision  la¬ 
tency  will  be  slowed  relative  to  the  situation  in  which  the  adjective  and  noun 
are  in  grammatical  agreement. 

A  similar  account  can  be  given  of  the  differences  between  grammatically 
congruent  and  grammatically  incongruent  pseudoadjective-noun  pairs.  Here, 
however,  it  must  be  assumed  that  the  syntactic  processor  responds  positively 
when  there  is  an  agreement  of  inflection  despite  the  fact  that  the  contextual 
item  la  nonsense.  Thus,  for  the  lexical  decision  on  the  second  member  of  a 
grammatically  congruent  paeudoadjective-noun  pair,  the  lexical  processor  and 
the  syntactic  processor  will  both  feed  positively  to  the  decision  maker — only 
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the  message  processor's  output  will  be  negative.  This  is  in  contrast  to  the 
situation  in  which  the  inflection  of  the  pseudoadjective  and  noun  do  not  agree 
grammatically,  for  in  this  situation  only  the  lexical  processor's  output  will 
be  positive.  Consequently,  the  decision  making  will  have  to  overcome  more 
negative  biasing  and  be  slowed  proportionately  greater  relative  to  the  situa¬ 
tion  in  which  the  pseudoadjective  and  noun  are  grammatically  suited.  For 
pseudoadjective-noun  pairs  lexical  decision  is  faster  when  the  Inflections 
agree  than  when  they  do  not  agree. 

Arguing  from  the  perspective  of  Forster's  (1979)  language-processing  sys¬ 
tem,  it  might  be  expected  that  the  rejection  of  pseudonouns  should  be  retarded 
by  grammatical  consistency.  The  negative  outputs  from  the  lexical  processor 
and  message  processor  will  contrast  with  the  positive  output  from  the  syntac¬ 
tic  processor  when  the  pseudonoun  target  and  its  context  are  in  grammatical 
agreement.  To  arrive  at  the  appropriate  ^  response  the  decision  maker  will 
have  to  resolve  the  inconsistency  of  outputs  and  the  bias  to  respond  yes.  In 
two  previous  experiments  examining  the  effects  of  minimal  grammatical  contexts 
on  lexical  decision  it  was  observed  that  pseudonouns  were  rejected  more  slowly 
when  the  preceding  item  was  a  grammatically  congruent  preposition  (Lukatela  et 
al.,  1983)  and  pseudoverbs  were  rejected  more  slowly  when  the  preceding  item 
was  a  grammatically  congruent  personal  pronoun  (Lukatela  et  al. ,  1982).  In 
the  present  experiments,  however,  there  is  no  statistically  significant  evi¬ 
dence  for  the  slowing  of  negative  decisions  by  grammatical  agreement. 

To  account  for  the  indifference  of  rejection  responses  to  grammatical 
congruency  requires  making  explicit  a  process  that  is  implicit  in  the  above 
account  of  acceptance  responses,  namely,  suffix  stripping.  According  to  the 
view  of  Taft  and  Forster  (1975),  perceiving  an  inflected  adjective  or  noun  in¬ 
volve:,  decomposing  the  item  into  its  stem  and  suffix  (see  also  Taft,  1981; 
Stanners  et  al. ,  1979).  In  performing  lexical  decision,  the  representation  of 
the  stem  morpheme  la  accessed  by  the  lexical  processor  and  the  appropriateness 
of  the  inflected  ending  is  determined  on  the  basis  of  the  information  stored 
with  the  stem's  representation.  A  similiar  decomposition  must  occur  for 
pseudoadjectives  and  pseudonouns  except  that  for  these  items  there  would  be  no 
specific  representation  of  the  stem  morpheme  to  be  accessed,  only  close 
approximations. 

It  might  be  supposed  that  where  the  lexical  processor  focusses  on  the 
word  stem,  the  syntactic  processor  focusses  on  the  bearers  of  grammatical 
information,  i.e. ,  roughly,  the  suffixes  of  open-class  words  and  the  free 
morphemes  of  closed-class  words.  Whatever  the  bearers  in  any  given  con- 
text-target  situation,  assessing  a  grammatical  fit  takes  time.  Indeed,  the 
difference  between  the  present  results  and  previous  results  with  regard  to 
negative  responses  might  suggest  that  discovering  the  grammatical  consistency 
in  an  adjective-pseudo noun  or  pseudoadjective-pseudonoun  pair  is  slower  than 
discovering  the  grammatical  consistency  in,  say,  a  preposltion-pseudonoun 
pair.  The  idea  is  that  the  longer  the  time  taken  by  the  syntactic  processor 
to  arrive  at  an  output  the  less  the  likelihood  that  the  activity  of  the 
syntactic  processor  will  Influence  the  time  course  of  the  lexical  decision;  an 
internally  defined  deadline  on  response  selection  must  be  assumed.  For  the 
preceding  suggestion  to  be  realizable  it  might  have  to  be  the  case  that  (1 ) 
the  grammatical  link  between  closed-class,  function  words  (e. g. ,  prepositions, 
pronouns)  and  open-class,  content  words  (e.g. ,  nouns,  verbs,  adjectives)  is 
"stronger"  and  more  rapidly  assessed  than  the  grammatical  link  between 
open-class  content  words  (e.g.,  the  link  between  adjectives  and  nouns);  and 
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(2)  the  syntactic  processor  can  be  influenced  by  the  lexical  processor.  Re¬ 
call  that  in  the  present  experiments,  although  the  lexical  decision  on  a  pseu¬ 
donoun  in  the  context  of  a  pseudoadjective  was  not  affected  by  grammatical 
consistency,  the  lexical  decision  on  a  noun  in  the  same  context  was  markedly 
affected.  In  short,  the  lexical  status  of  the  target  made  a  difference — and 
that  status  is  determined  by  the  lexical  processor. 

In  conclusion,  evidence  has  been  presented  for  the  influencing  of  lexical 
decisions  about  inflected  nouns  by  weakly  associated  inflected  adjectives  that 
are  grammatically  consistent  or  inconsistent  with  their  target  nouns.  This 
effect  seems  to  be  mediated  by  a  process  that  evaluates  the  grammar  of  a  noun 
and  its  adjectival  context  primarily  on  the  basis  of  the  inflected  morphemes. 
Although  this  effect  demonstrated  in  "nonpriming  contexts"  (Seidenberg  et  al. , 
1982)  can  be  referred  to  as  grammatical  priming  (Lukatela  et  al.  ,  1982; 
Lukatela  et  al.,  1983)  it  appears  to  be  a  postlexical  effect  related  to,  but 
distinct  from,  the  priming  mechanisms  of  automatic  spreading  activation  and 
context-induced  attentional  processing  (Neely,  1977;  Posner  &  Snyder,  1975) 
that  have  been  Identified  in  "lexical  priming"  contexts  (Seidenberg  et  al. , 
1982).  Lukatela  et  al.  (1982)  concluded  that  the  grammatical  priming  of 
inflected  verbs  by  pronouns  and  vice  versa  was  automatic.  Their  conclusion 
was  based  in  part  on  the  observation  that  pronominal  facilitation  of  verbs  was 
virtually  complete  when  the  onsets  of  context  and  target  were  separated  by  on¬ 
ly  300  ms.  They  recognized,  however,  that  this  automaticity  did  not  refer  to 
spreading  activation.  It  is  supposed  that  the  present  example  of  grammatical 
priming  is  also  automatic  but  the  kind  of  automaticity  being  referred  to  la 
closer  to  that  suggested  by  de  Groot  et  al.’s  (1982)  notion  of  an  automatic 
checking  for  coherence  (see  also  West  &  Stanovich,  1982)  than  it  is  to  the 
more  familiar  notion  of  an  automatic  spreading  of  influences  among  connected 
representations  in  the  Internal  lexicon. 
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Abstract.  This  study  investigated  serial  recall  by  congenitally, 
profoundly  deaf  signers  for  visually  specified  linguistic  informa¬ 
tion  presented  in  their  primary  language,  American  Sign  Language 
(ASL),  and  in  printed  or  f ingerspelled  English.  There  were  three 
main  findings.  First,  differences  in  the  serial-position  curves 
across  these  conditions  distinguished  the  changing-state  stimuli 
from  the  static  stimuli.  These  differences  were  a  recency  advantage 
and  primacy  disadvantage  for  the  ASL  signs  and  f ingerspelled  English 
words,  relative  to  the  printed  English  words.  Second,  the  deaf  sub¬ 
jects,  who  were  college  students  and  graduates,  used  a  sign-based 
code  to  recall  ASL  signs,  but  not  to  recall  English  words;  this  re¬ 
sult  suggests  that  well-educated  deaf  signers  do  not  translate  into 
their  primary  language  when  the  information  to  be  recalled  is  in  En¬ 
glish.  Finally,  mean  recall  of  the  deaf  subjects  for  ordered  lists 
of  ASL  signs  and  fingerspelled  and  printed  English  words  was  signif¬ 
icantly  less  than  that  of  hearing  control  subjects  for  the  printed 
words;  this  difference  may  be  explained  by  the  particular  efficacy 
of  a  speech-based  code  used  by  hearing  individuals  for  retention  of 
ordered  linguistic  Information  and  by  the  relatively  limited  speech 
experience  of  congenitally,  profoundly  deaf  individuals. 

Hearing  individuals  have  been  shown  to  use  a  speech-based  code  in  the 
short-term  recall  of  linguistic  information,  whether  spoken  or  printed  (Con¬ 
rad,  1964;  Wickelgren,  1965).  Their  recall  performance  is  similar  in  the  two 
cases  except  for  a  recency  advantage  favoring  spoken  over  printed  items  in  the 
last  serial  positions  (Corballis,  1966;  Murray,  1966).  Because  the  orthogra¬ 
phy  of  English  is  a  secondary  representation  derived  from  the  primary  or  basic 
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spoken  language  (Mattingly,  1972),  it  is  not  surprising  that  orthographic 
representations  are  recoded  into  a  speech-based  code.  In  addition,  a 
speech-based  code  may  be  especially  useful  when  the  memory  task  calls  for  re¬ 
call  of  ordered  information  (Baddeley,  1979;  Crowder,  1973;  Hanson,  1982;  Hea- 
ly,  1  975). 

The  relations  among  primary  language,  coding  strategy,  and  recall  per¬ 
formance  become  more  difficult  to  unravel  when  we  consider  bilingual  deaf  in¬ 
dividuals  who  use  American  Sign  Language  (ASL)  as  a  primary  language  and  En¬ 
glish  as  a  secondary  language.  The  term  "primary  language"  refers  to  a  natur¬ 
al  language  in  the  form  in  which  it  functions  as  a  principal  means  of  communi¬ 
cation  among  members  of  a  speech  community.  Writing  systems  and  other  invent¬ 
ed  representations  that  are  based  upon  natural  languages  are  viewed  as 
nonprimary  derived  systems. 


ASL  is  the  primary  visual-gestural  language  of  the  deaf  community  in  the 
United  States  and  Canada,  and  is  acquired  as  a  native  language  by  children  of 
deaf  parents.  Structural  differences  between  signed  and  spoken  languages  re¬ 
flect  differences  between  auditory-vocal  and  visual-gestural  channels  of 
communication.  For  example,  spoken  languages  are  characterized  by  sequential 
forms  of  structuring  at  the  abstract  phonological  and  morphological  levels. 
Words  are  composed  of  sequentially  arranged  phonemes,  and  morphological  pro¬ 
cesses  typically  add  one  or  more  prefixes  and/or  suffixes  (each  composed  of 
one  or  a  series  of  phonemes)  to  a  stem.  In  contrast,  ASL  is  strikingly  dif¬ 
ferent  from  spoken  languages  in  the  extent  to  which  it  utilizes  simultaneously 
structured  units  in  lexical  and  morphological  composition  (Bellugi,  1980;  Kli- 
ma  &  Bellugi,  1979).  Signs,  the  lexical  items  of  ASL,  are  composed  of  several 
co-occurring  formational  parameters  (Stokoe,  Casterline,  &  Croneberg,  1965), 
and  morphological  relations  are  expressed  by  spatial  and  temporal  modifica¬ 
tions  of  the  basic  form  of  a  sign  (Bellugi,  1980).* 


Those  who  use  ASL  as  a  primary  means  of  communication  also  use 
fingerspelling  for  concepts  lacking  a  sign.  Fingerspelling  is  a  manual  form 
of  English  orthography  that  assigns  a  unique  hand  configuration  to  every  let¬ 
ter  of  the  English  alphabet;  as  such,  it  is  a  changing-state  representation  of 
the  graphic  form  of  a  spoken  language.  Fingerspelling  is  not  used  as  a  pri¬ 
mary  means  of  communication  by  members  of  the  deaf  community  (Battison,  1978). 
Although  fingerspelled  words  may  often  occur  within  signed  sentences,  this 
letter-by-letter  sequential  representation  of  English  words  differs  consider¬ 
ably  from  the  co-occuring  formational  parameters  of  ASL  signs. 

No  writing  system  in  use  is  based  upon  ASL,  and  educated  deaf  American 
signers  read  and  write  in  English.  But  the  use  of  ASL  and  of  written  or  fin¬ 
gerspelled  English  by  deaf  bilinguals  is  quite  different  from  the  use  of  two 
spoken  languages  by  hearing  bilinguals.  For  a  deaf  person,  learning  the 
orthography  (whether  through  writing  or  fingerspelling)  of  English  means 
learning  an  orthographic  visual  system  derived  from  a  primary  form  to  which  he 
or  she  does  not  have  normal  access.  In  contrast,  hearing  bilinguals  do  have 
normal  access  to  the  primary  forms  of  both  languages  that  they  use.  Moreover, 
the  significant  structural  differences  between  ASL  and  English  at  both  the 
lexical  and  grammatical  levels  require  the  ASL-Engllsh  bilingual  to  know  two 
radically  different  forms  of  linguistic  structuring.  The  bilingual  who  uses 
two  spoken  languages  is  required  to  know  one  form  of  linguistic  structuring, 
that  characterizing  spoken  languages. 
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The  present  research  examined  serial-order  recall  by  deaf  signers  and  ad¬ 
dressed  the  question  of  how  coding  strategies  and  recall  performance  are 
affected  by  the  requirement  to  remember  ASL  in  contrast  to  English.  Differ¬ 
ences  in  performance  that  may  stem  from  the  presentation  of  English  words  by 
fingerspelling  and  print  were  also  examined.  The  hypotheses  underlying  this 
work  are  discussed  in  the  following  sections  on  serial-position  effects,  cod¬ 
ing,  and  accuracy  of  recall. 

Serial-Position  Effects 

Although  hearing  subjects  use  a  speech-based  code  for  recall  of  both  spo¬ 
ken  and  printed  word  lists,  auditory  presentation  results  in  a  recency  advan¬ 
tage  over  visual  presentation  (for  a  review  of  this  research,  see  Penney, 
1975).  This  advantage  for  the  more  recently  presented  items  occurs  whether 
the  experimenter  or  the  subject  reads  the  stimuli  aloud.  On  the  basis  of  such 
findings,  the  critical  variable  appears  to  be  hearing  the  items.  The  "modali¬ 
ty  effect"  was  originally  attributed  to  the  fact  that  information  in 
pre-categorical  acoustic  storage  (PAS)  has  greater  durability  than  information 
in  an  iconic  sensory  representation  (Crowder  &  Morton,  1969). 

However,  further  research  provided  evidence  for  similar  effects  in  the 

visual  modality  in  the  absence  of  acoustic  information,  thus  casting  doubt  on 
the  PAS  explanation  for  the  recency  advantage.  Findings  of  recency  advantages 
for  ASL  signs  (Shand,  1980),  moving  hand  shapes  (Campbell,  Dodd,  &  Brasher, 
1983),  lipread  items  (Campbell  &  Dodd,  1980;  cf.  Crowder,  1983),  mouthed 

items  (Nairne  &  Walters,  1983),  and  items  vocalized  "aloud"  by  deaf  subjects 
(Engle,  Spraggins,  &  Rush,  1982)  are  all  incompatible  with  an  explanation 

based  on  acoustic  advantage. 

Two  alternative  accounts  to  the  PAS  explanation  have  been  proposed. 

First,  the  difference  in  recency  favoring,  for  example,  spoken,  lipread,  and 
signed  information  over  orthographic  information  may  reflect  an  advantage  in 
recall  of  primary-language  input  over  nonprimary  (printed)  input  (Campbell  & 
Dodd,  198O;  Campbell  et  al,,  1983;  Nairne  &  Walters,  1983;  Shand,  1980;  Shand 
&  Klima,  1981  ).  Second,  this  effect  may  be  attributed  to  an  advantage  in 
remembering  changing-state  information  over  remembering  static  information 
(Campbell  &  Dodd,  1980;  Campbell  et  al.,  1983;  Nairne  &  Walters,  1983).  Here¬ 
after,  the  term  "dynamic"  will  be  used  to  mean  "changing-state." 

It  is  important  to  note  that  recall  differences  between  lists  of  words 
that  are  heard  and  lists  that  are  silently  read  are  restricted  to  the  recency 
portion  of  the  curve,  with  a  recency  advantage  for  the  words  that  are  heard. 
Thus,  there  is  an  overall  advantage  for  the  heard  lists.  However,  the  recency 
advantage  for  lipread  and  for  mouthed  lists  does  not  yield  an  overall  advan¬ 
tage  over  printed  (silently  read)  lists.  This  is  because  recall  of  lipread 
and  mouthed  lists  is  poorer  than  recall  of  printed  lists  at  earlier  serial 
positions.  Researchers  have  tended  to  focus  on  the  similarity  in  recency  ef¬ 
fects  among  mouthed,  lipread,  and  spoken  input  conditions,  without  giving  ade¬ 
quate  attention  to  the  fact  that  spoken  input  results  in  the  best  recall  over¬ 
all.  The  dynamic-presentation  hypothesis  and  the  primary-language  hypothesis 
must  therefore  be  examined  with  respect  to  effects  that  span  the  entire  seri¬ 
al-position  curve. 

The  present  study  was  designed  to  separate  serial  position  effects 
attributable  to  primary  language  from  those  attributable  to  dynamic  presenta¬ 
tion.  Serial  position  functions  that  distinguished  f ingerspelled  and  printed 
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English  lists  from  lists  of  ASL  signs  would  provide  support  for  the  pri¬ 
mary-language  hypothesis.  On  the  other  hand,  serial  position  functions  that 
distinguished  the  signed  and  f ingerspelled  lists  from  the  printed  lists  would 
provide  support  for  the  dynamic-presentation  hypothesis. 

Cod ing 


Research  with  deaf  signers  can  also  provide  insight  into  the  question  of 
whether  a  code  based  on  one's  primary  language  is  useful  when  the  recall  task 
involves  Information  whose  linguistic  structure  is  quite  different  from  that 
of  the  primary  language.  Shand  (1982;  Shand  &  Klima,  1981)  suggested  that  the 
primary  code  is  the  natural  and  most  efficient  code  for  short-term  recall  of 
linguistic  information.  Recoding  by  hearing  individuals  from  print  into  a 
speech-based  code  takes  advantage  of  the  systematic  relation  between  the  spo¬ 
ken  form  and  its  orthography  (Mattingly,  1972).  However,  there  is  no  such 
systematic  relation  between  ASL  signs  and  English  orthography. 

Simultaneously  occurring  parameters  of  movement,  place  of  articulation 
within  the  signing  space,  and  hand  configuration  are  the  sub  lex  leal  components 
of  ASL  signs  (Stokoe  et  al.,  1965).  These  formational  parameters  (cheremes  or 
primes)  evidently  support  recall  of  signs  by  deaf  signers  much  as  phonetic 
parameters  of  speech  support  recall  of  spoken  information  by  hearing  individu¬ 
als  (Bellugi,  Klima,  &  Siple,  1975;  Hanson,  1982;  Poizner,  Bellugi,  &  Tweney, 
1981;  Shand,  1982).  Thus,  Bellugi  et  al.  (1975)  found  intrusion  errors  sug¬ 
gesting  sign-based  coding  of  ASL  signs  by  deaf  signers  on  a  serial-recall 
task.  The  majority  of  the  intrusion  errors  were  signs  that  differed  from  a 
correct  response  by  one  formational  parameter.  For  example,  some  of  the  sub¬ 
jects  reported  JEALOUS  for  CANDY.  The  signs  for  JEALOUS  and  CANDY  are  a  mini¬ 
mal  pair  in  that  they  have  the  same  place  of  articulation  and  movement;  they 
differ  only  in  hand  configuration.  Likewise,  some  subjects  reported  NEWSPAPER 
for  BIRD;  these  two  signs  share  movement  and  hand  configuration  and  differ  on¬ 
ly  in  place  of  articulation. 

Evidence  for  both  sign-based  and  speech-based  recoding  of  printed  words 
by  deaf  subjects  has  been  obtained  in  serial-order  recall  tasks  (Hanson,  1982; 
Lichtenstein,  in  press;  Shand,  1982).  Subject  characteristics  associated  with 
coding  preferences  suggest  that  speech-based  recoding  is  cypically  used  by 
those  prellngually ,  profoundly  deaf  adults  who  are  better  readers  and  who  have 
better  speech  production  skills  (Lichtenstein,  in  press).  A  shortcoming  of 
previous  studies  was  that  they  compared  the  performance  of  different  groups  of 
subjects  on  the  different  stimulus  types.  Furthermore,  they  never  included 
fingerspelled  English.  Presenting  ASL  signs,  printed  English  words  and  fin- 
gerspelled  English  words  to  the  same  group  of  deaf  signers  in  the  present 
study  made  it  possible  to  ascertain  whether  deaf  individuals  changed  strate¬ 
gies  as  the  stimuli  changed  or  maintained  a  preferred  strategy,  such  as 
sign-based  or  speech-based  coding.  In  order  to  provide  English  words  that 
were  compatible  with  a  sign-based  code,  half  of  the  fingerspelled  and  printed 
words  were  chosen  because  they  had  readily  available  sign  translations 
( "high-signab ility"  words);  the  other  half,  because  they  did  not  ("low-slgna- 
bility"  words).  If  deaf  subjects  recode  into  signs  and  recoding  into  one's 
primary  language  is  the  most  natural  and  efficient  strategy  (Shand,  1982), 
then  two  outcomes  might  be  predicted.  First,  high-signab ility  words  should  be 
recalled  more  accurately  than  low-signab ility  words.  Second,  recall  perform¬ 
ance  on  high-slgnabllity  words  should  provide  evidence  of  sign- intrusion  er¬ 
rors. 
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Accuracy  of  Recall 

In  general,  when  congenitally,  profoundly  deaf  individuals  perform  a  task 
that  calls  for  ordered  recall  of  English  words  or  letters,  they  do  not  perform 
as  well  as  hearing  subjects  (Belmont  &  Karchmer,  1978;  Belmont,  Karchmer,  & 
Pilkonis,  1976;  Hanson,  1982;  MacDougall,  1979:  Wallace  &  Corballis,  1973). 
Belmont  and  Karchmer  argued  that  the  generally  poorer  performance  of  deaf  in¬ 
dividuals  reflects  a  "mismatch"  between  the  native  language  (ASL)  and  the  lan¬ 
guage  of  the  information  to  be  recalled  (English).  However,  even  on  seri¬ 
al-recall  tasks  involving  ASL  signs,  deaf  signers  do  not  remember  as  many 
items  as  hearing  subjects  tested  on  the  signs'  printed  (Hanson,  1982)  or  spo¬ 
ken  English  equivalents  (Bellugi  et  al.,  1975).  Moreover,  Hanson  found  that 
deaf  subjects  did  perform  as  well  as  hearing  subjects  on  tasks  that  called  for 
free  recall  of  printed  English  words.  The  nature  of  the  ordered-recall  task, 
rather  than  characteristics  of  the  input,  may  actually  favor  hearing  individu¬ 
als. 


Recent  studies  indicate  that  the  speech  code  is  particularly  useful  for 
retaining  order  information  (Baddeley,  1979;  Crowder,  1978;  Hanson,  1982;  Hea- 
ly,  1975).  For  deaf  subjects,  accuracy  of  recall  has  been  found  to  correlate 
with  the  use  of  a  speech-based  code;  those  who  use  this  code  efficiently  re¬ 
call  more  than  those  who  use  it  inefficiently  or  not  at  all  (Conrad,  1979; 
Hanson,  1982;  Lichtenstein,  in  press).  Therefore,  it  seems  that  the 
speech-based  code  may  facilitate  serial-order  recall  in  a  way  that  alternative 
coding  mechanisms,  including  sign-based  coding  of  ASL  signs,  do  not.  Further¬ 
more,  it  is  likely  that  the  use  of  the  speech-based  code  by  deaf  individuals 
is  not  as  effective  as  it  is  for  hearing  people.  The  present  study  examined 
the  recall  performance  of  deaf  subjects,  who  were  highly  proficient  in  English 
as  well  as  in  ASL,  and  asked  whether  accuracy  of  recall  differs  as  a  function 
of  the  type  of  linguistic  input  (dynamic  vs.  static;  primary  vs.  nonprimary) 
or  whether  serial  recall  is,  regardless  of  input  characteristics,  a  particu¬ 
larly  difficult  task  for  individuals  who  do  not  have  normal  access  to  speech. 

Experiment 

This  experiment  compared  the  performance  of  congenitally,  profoundly  deaf 
signers  when  presented  with  English  words  and  ASL  signs  for  serial-order  re¬ 
call.  The  presentation  mode  of  the  English  words  was  varied  so  that  some  were 
printed  and  others  were  f ingerspelled.  All  the  deaf  subjects  used  ASL  as 
their  primary  means  of  communication.  The  recall  performance  of  two  groups  of 
deaf  subjects  was  compared  in  order  to  find  out  whether  there  are  performance 
differences  between  native  and  nonnative  signers.  Members  of  one  group  ac¬ 
quired  ASL  as  a  native  language  from  deaf  parents,  and  members  of  the  other 
group  learned  ASL  outside  the  home  in  the  early  school  years.  A  normal-hear¬ 
ing  control  group  was  tested  on  the  printed  stimuli. 

Method 

Subjects 

All  subjects  were  tested  individually  and  were  paid  for  their  participa¬ 
tion. 


Deaf  subjects .  Twenty  congenitally,  profoundly  deaf  subjects  participat¬ 
ed  in  the  short-term  memory  experiment;  two  were  eliminated  because  their 
hearing  loss  was  less  than  the  criterion  for  profound  deafness  (85  dB,  bet- 
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ter-ear  average).  Background  information  gathered  from  the  subjects  indicated 
that  they  all  used  ASL  as  their  primary  means  of  communication,  supplemented 
by  fingerspelling.  Eight  of  the  subjects  were  born  to  deaf  parents  and  had 
acquired  ASL  as  a  native  language  (native  signers),  while  10  of  the  subjects 
had  hearing  parents  and  learned  ASL  outside  the  home  in  the  early  school  years 
(nonnative  signers).  All  subjects  were  currently  attending  or  were  recent 
graduates  of  Gallaudet  College,  a  liberal  arts  college  for  deaf  students. 

Twenty  congenitally  deaf  adults  served  as  control  subjects  on  a  perceptu¬ 
al  task,  described  below.  Nine  of  these  subjects  had  participated  in  the  mem¬ 
ory  experiment  several  months  before.  Each  had  a  hearing  loss  of  at  least  70 
dB  in  the  better  ear.  They  were  all  students  or  graduates  of  Gallaudet  Col¬ 
lege  and  reported  using  ASL  as  a  primary  means  of  communication. 

Hearing  Subjects.  Ten  hearing  subjects  were  recruited  from  among  Yale 
University  students  and  affiliates.  They  were  native  speakers  of  English  who 
reported  no  history  of  hearing  impairment.  Because  the  hearing  subjects  were 
tested  on  both  sets  of  printed  stimuli,  10  subjects  provided  sufficient  data 
for  comparison  with  the  deaf  subjects. 

Stimuli 

Stimulus  lists  were  constructed  from  141  high-signab ility  (HS)  English 
nouns  and  94  low-signab ility  (LS)  English  nouns.  All  were  words  considered  to 
be  comnonly  known  by  college-age  adults,  and  were  selected  with  the  assistance 
of  a  deaf  native  signer.  HS  words  were  matched  with  LS  words  for  frequency  of 
occurrence  in  printed  English  (KuSera  &  Francis,  1967).  HS  words  were  random¬ 
ly  assigned  to  each  of  three  presentation  conditions:  signs,  fingerspelling, 
and  print.  LS  words  were  randomly  assigned  to  fingerspelling  or  print  condi¬ 
tions.  These  assignments  produced  one  set  of  stimuli.  A  second  set  of  stimu¬ 
li  was  constructed  by  reassigning  printed  items  to  fingerspelling  or  signs, 
reassigning  f ingerspelled  items  to  signs  or  print,  and  reassigning  signed 
items  to  print  or  fingerspelling,  in  order  to  partially  counterbalance  the 
assignment  of  words  to  presentation  conditions.  Thus,  the  following  five 
conditions  were  obtained  for  both  sets  of  stimuli;  (1)  American  Sign  Language 
signs:  (2)  HS  f ingerspelled  English  words;  (3)  GS  f ingerspelled  English  words; 
(4)  HS  printed  English  words;  and  (5)  LS  printed  English  words.  Each  condi¬ 
tion  contained  42  nouns,  in  seven  lists  of  6  nouns  each.  Previous  work  with 
deaf  subjects  indicated  that  a  list  containing  6  nouns  could  be  expected  to 
produce  both  primacy  and  recency  serial  position  effects  (Bellugl  et  al. , 
1975).  An  additional  5  lists  of  5  nouns  provided  practice  blocks. 

Procedure 


All  stimulus  lists  were  videotaped  at  a  rate  of  2  sec  per  trial.  A  na¬ 
tive  signer  recorded  the  signed  and  f ingerspelled  lists  on  videotape:  for  max¬ 
imal  visibility,  she  was  framed  from  forehead  to  waist.  The  signer  maintained 
a  neutral  expression  throughout  the  taping  session.  Printed  words  were  video¬ 
taped  directly  from  an  Atari  400  computer  and  were  displayed  for  1.5  sec  with 
a  .5-sec  interstimulus  interval.  Stimuli  in  each  condition  were  recorded  in 
seven  continuous  lists  of  six  nouns  each.  One  practice  list  preceded  each  of 
the  five  conditions. 
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The  order  in  which  stimulus  conditions  were  presented  was  partially  bal¬ 
anced  across  subjects  as  follows:  There  were  five  orders  of  presentation  for 
each  stimulus  set  and  no  condition  ever  occurred  in  the  same  ordinal  position 
twice.  Four  subjects  were  tested  on  each  of  the  orders.  Order  l  was  based  on 

differences  in  mode  of  presentation:  (a)  Signs;  (b)  HS  Fingerspell ing ;  (c)  LS 

Fingerspelling;  (d)  HS  Print;  (e)  LS  Print.  Order  2  was  also  based  on  mode 
differences  but  it  involved  a  rearrangement  of  the  ordering  of  signs, 
fingerspelling,  and  print  modes:  (a)  HS  Print;  (b)  LS  Print;  (c)  HS 

Fi ngerspell ing;  (d)  LS  Fingerspelling;  (e)  Signs.  Order  3  arranged  lists  by 
signability  differences:  (a)  ^  Print;  (b)  ^  Fingerspelling;  (c)  ^  Print; 

(d)  Signs ;  (e)  ^  Fingerspelling.  Order  U  arranged  lists  by  signability  in  a 

different  ordering  than  order  :  (a)  HS  Fingerspelling;  (b)  HS  Print; 

(c)  Signs ;  (d)  ^  Print;  (e)  ^  Fingerspelling.  Order  5  mixed  modes  and  sig¬ 
nability  in  a  random  fashion:  (a)  '_S  Fingerspelling  (b)  Signs ;  (c)  Print ; 

(d)  ^  Fingerspelling;  (e)  ^  Print . 


Deaf  subjects  were  tested  on  all  five  conditions  by  a  native  signer  who 
provided  both  printed  and  signed  instructions;  nine  of  the  subjects  were  test¬ 
ed  on  one  set  of  stimuli  and  nine  on  the  other.  The  subjects  were  told  that 
they  would  see  lists  of  nouns  presented  by  various  modes:  ASL  signs,  printed 
English,  and  f ingerspelled  English.  A  message  printed  on  the  screen  indicated 
the  termination  of  each  list.  The  subjects  were  instructed  to  watch  the 
screen  and  to  write  the  words  they  had  just  seen,  in  serial  order,  on  the  an¬ 
swer  sheet  provided.  The  answer  sheet  included  the  numbers  1  through  6  for 
each  list  with  blank  spaces  for  responses.  The  subjects  were  not  prevented 
from  recording  words  in  any  order.  It  was,  however,  required  that  words  ap¬ 
pear  in  their  correct  serial  positions.  Bellugi  and  Siple  (197^)  reported 
that  deaf  signers'  recall  performance  with  written  report  of  signs  was  as  good 
as  their  recall  perfromance  with  signed  report. 


To  control  for  possible  dialectal  variations  on  the  interpretations  of 
the  signs  and  to  ensure  a  fair  scoring  procedure,  a  control  group  of  deaf  sub¬ 
jects  was  tested  in  a  perceptual  task.  These  subjects  were  asked  to  watch  the 
signed  portions  of  the  videotapes  and  to  simply  write  down  the  English  trans¬ 
lation  of  each  sign. 


The  hearing  subjects  were  tested  by  a  hearing  experimenter  who  provided 
both  printed  and  spoken  directions.  Stimuli  for  the  hearing  subjects,  who 
served  as  partial  controls  in  this  experiment,  consisted  of  the  printed  condi¬ 
tions  only.  Each  hearing  subject  saw  both  sets  of  printed  stimuli. 


Scoring 


All  subjects'  responses  in  the  memory  task  were  scored  as  follows:  Items 
were  marked  correct  if  they  appeared  in  the  proper  serial  position  in  the  cur¬ 
rent  list.  Dialectal  differences  were  taken  into  account  when  scoring  the  an¬ 
swer  sheets  from  signed  trials;  a  response  on  the  memory  task  that  matched  a 
response  in  the  correct  serial  position  on  the  perceptual  task  was  scored  as 
correct.  Because  there  were  seven  lists  in  each  condition,  seven  was  the  max¬ 
imum  score  possible  at  each  serial  position  for  each  condition. 
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Results 

A  three-way  ANOVA  examined  the  within-subjects  effects  of  presentation 
condition  (ASL  signs,  printed  English,  f ingerspelled  English),  and  serial 
position  (one  through  six),  and  the  between-sub jects  effect  of  group  (native 
or  nonnative  signers)  on  the  number  of  words  the  deaf  subjects  recalled 
accurately.  For  the  purposes  of  this  analysis,  performance  on  high-  and 
low-signab il ity  lists  was  averaged.  The  analysis  revealed  a  significant  main 
effect  of  serial  position,  F(5,  80)  =  30.01,  £  <  .0001,  and  no  significant  ef¬ 
fect  of  either  group  or  condition  (both  Fs  <  1.00).  These  latter  results 
indicated  that  native  and  nonnative  signers  could  not  be  differentiated  on  the 
basis  of  their  performance  on  these  serial-recall  tasks  and  that  their  recall 
accuracy  was  similar  for  the  three  presentation  conditions.  There  was,  howev¬ 
er,  a  significant  condition  X  position  interaction,  F(10,  160)  =  3-33,  £  < 
.001,  indicating  differential  effects  on  the  serial-position  curve  as  a  func¬ 
tion  of  condition.  This  interaction  is  shown  in  Figure  1,  in  which  mean  re¬ 
call  is  plotted  at  each  serial  position  for  the  three  conditions:  ASL  signs, 
fingerspelled  English  words,  and  printed  English  words.  In  this  figure,  we 
have  pooled  the  high-  and  low-signab ility  trials  and  averaged  across  the  two 
groups  of  deaf  subjects. 
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Figure  1.  Mean  number  of  printed,  fingerspelled,  and  signed  items  correctly 
recalled  by  deaf  subjects  at  each  serial  position. 
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Additional  analyses  were  undertaken  in  order  to  understand  the  nature  of 
the  interaction.  The  competing  hypotheses  regarding  the  effects  of  primary 
language  vs.  those  of  dynamic  presentation  prompted  examination  of  the  differ¬ 
ences  in  serial-position  effects  as  a  function  of  condition.  To  test  the  pri¬ 
mary  language  hypothesis,  one  ANOVA  compared  performance  on  the  print  condi¬ 
tion  to  that  on  the  fingerspelling  condition.  This  was  a  three-way  analysis, 
as  above,  with  the  exclusion  of  the  sign  condition.  In  this  comparison,  the 
condition  X  position  interaction  was  also  highly  significant,  F(5,  80)  =  6.8A, 
£  <  .0001.  Thus,  the  serial-position  curves  for  the  print  and  f ingerspel 1 i ng 
conditions  differed.  Performance  on  the  signed  and  f ingerspelled  trials  was 
compared  in  the  same  way.  In  this  ANOVA,  the  condition  X  position  interaction 
disappeared,  F(5,  80)  =  1.69,  p  >  .06.  This  lack  of  a  significant  interaction 
indicates  no  difference  in  the  serial-position  curves  for  the  signed  and  fin- 
gerspelled  trials.  To  complete  the  comparison  of  dynamic  and  static  condi¬ 
tions,  an  ANOVA  was  performed  on  the  printed  and  signed  trials;  the  results 
showed  a  significant  condition  X  position  interaction,  F(5,  80)  =  3.66,  £  < 
.01.  As  is  evident  in  the  figure,  the  deaf  subjects  were  never  at  ceiling  in 
their  recall  performance. 

Taken  together,  these  analyses  indicate  that  the  condition  X  position 
interaction  in  the  original  analysis  was  due  to  differences  between  the  print 
condition  on  the  one  hand  and  the  fingerspelling  and  sign  conditions  on  the 
other.  This  is  consistent  with  the  hypothesis  that  recall  of  dynamic  and 
static  forms  of  linguistic  information  produces  different  serial-position 
curves.  In  order  to  localize  the  effects  of  dynamic  vs.  static  input  on  the 
serial-position  curve,  contrasts  were  done  at  each  serial  position,  going  back 
to  the  original  analysis,  by  comparing  recall  performance  in  the  static  condi¬ 
tion  (print)  with  that  in  the  dynamic  conditions  (fingerspelling  and  signs). 
The  contrast  was  significant  at  Position  1,  F{1,  3**)  =  10.149,  £  <  .01  and 
Position  2,  £(1 ,  34)  =  4.99,  £  <  .05,  with  accuracy  greater  in  the  print 
condition  than  in  the  other  two  conditions.  The  contrast  was  also  significant 
at  Position  5,  F(1,  34)  =  10.05,  £  <  .01,  and  Position  6,  F(1 ,  34)  =  8.67,  £  < 
.01,  with  accuracy  greater  in  the  sign  and  fingerspelling  conditions  than  in 
the  print  condition.  The  contrast  was  not  significant  at  Position  3,  or  Posi¬ 
tion  4  (both  F's  <  1.00).  These  results  indicate  that  there  is  a  recency  ad¬ 
vantage  for  the  dynamic  information  (signed  and  f ingerspelled)  but  a  primacy 
advantage  for  the  static  information  (printed).  The  existence  of  some  recency 
gains  in  all  conditions  probably  reflects  the  relatively  short  list  length  and 
the  freedom  of  subjects  to  record  the  items  they  remembered  in  any  order  they 
wished. 

To  test  specifically  for  the  effects  of  signability  on  recall,  a  3-way 
ANOVA  was  performed  on  the  recall  accuracy  for  the  wi thin- sub jects  factors  of 
signability  (HS,  LS)  X  mode  (fingerspelling,  print)  X  serial  position  (1-6). 
Because  the  group  factor  never  entered  into  any  significant  main  effects  or 
interactions,  native  and  nonnative  subjects  were  pooled  in  this  and  subsequent 
analyses.  The  main  effect  of  signability  was  nonsignificant  (F  <  1.00);  thus, 
the  availability  of  a  direct  sign  translation  for  an  English  word  did  not  en¬ 
hance  its  recall.  Mean  recall  of  all  deaf  subjects  on  the  6-item  lists  was 
3.16  for  the  HS  stimuli  and  3.12  for  the  LS  stimuli.  The  main  effect  of  mode 
was  also  nonsignificant  (F  <  1.0),  and  the  ANOVA  revealed  a  significant 

mode  X  position  interaction,  F(5,  85)  =  7.42,  £  <  .0001,  reflecting  the 

differences  in  serial-position  effects  between  static  printed  input  and  dynam¬ 
ic  f ingerspelled  input.  As  in  the  previous  analysis,  the  main  effect  of  seri¬ 
al  position  was  highly  significant,  F(5,  85)  =  30.91,  £  <  .0001. 
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The  analysis  of  signability  indicated  that  if  deaf  subjects  were  using  a 
sign -based  code  to  recall  English  words,  it  was  not  to  their  advantage.  How¬ 
ever,  no  evidence  of  sign-based  coding  of  f ingerspelled  or  printed  English 
words  was  obtained  in  an  analysis  of  the  intrusion  errors.  Two  deaf  native 
signers  of  ASL  examined  each  error  on  the  sign  trials  and  on  the  HS 
fingerspelling  and  print  trials  and  judged  whether  or  not  each  was  formation- 
ally  similar  to  the  target  item  (i.e. ,  a  sign  intrusion).  Disagreements  be¬ 
tween  the  two  signers  were  rare  (occurring  on  only  ^  of  the  63  errors  that  did 
not  include  misorderings  or  blanks)  and  when  they  occurred,  they  were  resolved 
by  consulting  a  vocabulary  book  on  ASL  signs  (O'Rourke,  1978).  Error  analysis 
of  the  sign  trials  showed  that  of  the  63  errors,  30  were  sign  intrusions.  The 
results  of  the  perceptual  task  indicated  that  these  sign  intrusions  were  not 
due  to  perceptual  confusions.  (Many  of  the  remaining  errors  consisted  of 
words  that  were  formationally  similar  to  a  word  in  another  position  in  the 
same  list.)  Table  1  lists  examples  of  sign  intrusion  errors  and  the  corre¬ 
sponding  target  signs  for  the  same  serial  positions  in  the  recorded  list  of 
signs.  In  contrast,  errors  made  on  the  f ingerspelled  and  printed  English 
conditions  did  not  tend  to  be  sign  intrusions.  The  79  errors  on  the  HS  trials 
(not  counting  misorderings  and  blanks)  included  only  a  single  response  that 
had  a  sign  similar  to  that  of  the  target  item.  This  was  the  intrusion  of 
"caution"  for  "warning,"  which  is  also  semantically  related.  The  other  78  er¬ 
ror  could  not  be  differentiated  in  kind  from  errors  on  corresponding  LS  print¬ 
ed  and  LS  f ingerspelled  lists.  Errors  made  on  f ingerspelled  and  on  printed 
lists  appeared  to  be  of  the  same  general  type,  as  indicated  by  the  examples  of 
errors  on  HS  lists  provided  in  Table  2.  Patterns  of  visual  resemblance  of 
item  and  error  pairs  are  obvious.  Such  errors  could  reflect  either  visual  or 
phonological  confusions;  the  present  experiment  was  not  designed  to  distin¬ 
guish  between  these  two  possibilities.  Taken  together,  these  results  suggest 
that  well-educated  deaf  signers  employ  sign-based  coding  in  retention  of  ASL 
signs  but  not  in  retention  of  English  words,  whether  printed  or  f ingerspelled. 

Finally,  recall  accuracy  of  the  deaf  subjects  on  the  printed  trials  was 
compared  with  that  of  the  hearing  subjects.  Collapsing  the  data  across  all 
deaf  subjects,  mean  recall  on  the  six-item  printed  blocks  was  3.14.  (It 
should  be  remembered  that  for  the  deaf  subjects,  mean  recall  did  not  differ 
significantly  as  a  function  of  condition:  average  recall  on  the  fingerspel¬ 
ling  and  sign  conditions  was  3.10  and  3.17,  respectively.)  Mean  recall  of  the 
hearing  subjects  on  the  printed  blocks  was  4.87,  and  many  of  them  were  at 
ceiling.  An  analysis  comparing  mean  recall  of  the  deaf  subjects  with  that  of 
the  hearing  subjects  Indicated  that  there  was  a  significant  difference  in  the 
accuracy  of  subjects  as  a  function  of  group  (deaf  or  hearing),  t(26)  -  6.85,  p 
<  .0001.  No  valid  tests  of  parallel  serial  position  differences  could  be  used 
due  to  the  ceiling  performance  of  so  many  hearing  subjects. 

Discussion 

In  the  present  experiment,  there  was  no  significant  difference  in  per¬ 
formance  between  the  native  and  nonnative  signers  tested.  This  suggests  that 
native  signers  and  nonnative  signers  who  learned  ASL  at  an  early  age  form  a 
homogeneous  subject  group;  as  far  as  these  tasks  are  concerned,  ASL  functions 
as  a  primary  language  in  the  same  way  for  both. 

Serial-position  effects  were  examined  in  order  to  test  the  dynam¬ 
ic-presentation  hypothesis  against  the  primary-language  hypothesis  by  compar¬ 
ing  deaf  signers'  recall  of  English  print,  fingerspelling,  and  ASL  signs.  The 


Table  1 


Item  and  Error  Pairs  in  Recall  of  ASL  Signs 


Target  Item 

Intrusion  Error 

Parameter(s)  of  Difference 

danger 

algebra 

movement 

zero 

photograph 

handshape 

telegram 

declination 

handshape 

secret 

patience 

movement 

debt 

this 

movement 

instructions 

iceskating 

handshape 

pope 

princess 

movement,  location 

fence 

screen 

handshape,  location 

rosary 

Interpreter 

movement,  location 

sandwich 

school 

movement,  location 

Table  2 


Item  and  Error  Pairs  in  Recall  of  Fingerspelled 
and  Printed  English  Words 


FINGERSPELLING 

PRINT 

Target 

Error 

Target 

Error 

diamond 

almond 

heart 

horse 

wrestling 

recycling 

concept 

corn 

ceremony 

cemetery 

leaf 

leather 

pipe 

pope 

interference 

inference 

bomb 

bubble 

rosary 

rosemary 

noon 

noun 

digit 

dignity 

temptation 

temperature 

outlaw 

outline 

inegar 

vineyard 

cure 

burn 
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results  revealed  that  the  serial-position  curves  were  similar  for  the  two 
types  of  dynamic  stimuli  (fingerspelling  and  signs)  and  that  these  curves  dif¬ 
fered  from  those  obtained  for  the  static  stimuli  (print).  Recall  was  better 
for  dynamic  stimuli  in  the  last  two  serial  positions  but  worse  in  the  first 
two  serial  positions. 

The  recency  advantages  found  for  f ingerspelled  English  words  and  ASL 
signs  add  to  a  growing  body  of  results  indicating  that  "modality  effects"  can 
be  obtained  even  in  the  absence  of  acoustic  input  (Campbell  &  Dodd,  1980; 
Campbell  et  al.,  1983;  Engle  et  al. ,  1982;  Nairne  &  Walters,  1983;  Shand 
1980).  However,  the  present  results  are  inconsistent  with  the  primary-lan¬ 
guage  hypothesis,  according  to  which  differences  would  have  been  expected  be¬ 
tween  the  serial-position  curves  for  the  primary-language  items  (ASL  signs) 
and  those  for  the  nonprimary-language  items  (fingerspelling  and  print).  Rath¬ 
er,  the  present  findings  provide  support  for  the  hypothesis  that  the  "modality 
effect"  is  a  reflection  of  a  recency  advantage  that  accrues  to  dynamically 
presented  information,  regardless  of  input  modality.  The  primacy  advantage 
found  for  printed  stimuli  over  f ingerspelled  and  signed  stimuli  resembled  the 
primacy  advantage  for  printed  over  lipread  and  mouthed  stimuli  reported  in 
previous  studies  (Campbell  &  Dodd,  1980;  Nairne  &  Walters,  1983).  As  men¬ 
tioned  earlier,  the  comparison  between  hearing  subjects'  recall  of  spoken  and 
of  printed  words  reveals  only  a  recency  difference  between  the  two  conditions, 
and  consequently,  an  overall  advantage  for  the  spoken  words.  But  it  appears 
that  in  spite  of  the  recency  advantage  for  nonacoustic  dynamic  stimuli  (e.g., 
signs  and  lipread,  mouthed,  and  f Ingerspelled  words),  such  stimuli  show  no 
overall  advantage  over  static  stimuli  (printed  words).  What  is  important  to 
note  in  all  of  these  studies  is  that  dynamic  information  (whether  spoken, 
signed,  flngerspelled,  etc.)  and  static  information  (printed)  yield  different 
serial-position  curves. 

As  in  previous  research  (Bellugi  et  al.,  1975),  analysis  of  the  deaf  sub¬ 
jects'  intrusion  errors  revealed  sign-based  coding  of  the  ASL  signs.  However, 
the  lack  of  sign  intrusion  errors  on  both  printed  and  flngerspelled  English 
lists  suggests  that  well-educated  deaf  persons  do  not  recode  English  words  in¬ 
to  signs.  In  addition,  there  was  no  recall  advantage  for  those  English  words 
that  have  direct  sign  translations.  These  results  are  especially  noteworthy 
because  they  suggest  that  deaf  bilinguals  can  change  their  recall  strategies 
depending  upon  whether  they  are  presented  with  information  in  English  or  in 
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ASL. 


The  nunfcer  of  items  recalled  by  deaf  signers  did  not  differ  as  a  function 
of  language,  signability,  or  dynamic-static  differences.  But  their  mean  re¬ 
call  was  significantly  less  than  that  of  hearing  subjects  when  the  performance 
of  both  groups  on  the  printed  trials  was  compared.  These  results  are  not  con¬ 
sistent  with  the  view  that  the  generally  poorer  performance  on  serial-recall 
tasks  by  deaf  subjects  than  by  hearing  subjects  stems  from  the  requirement  to 
remember  English.  In  conjunction  with  earlier  findings  that  deaf  signers  per¬ 
form  as  well  as  hearing  individuals  on  free-recall  tasks  involving  English 
stimuli  (Hanson,  1982),  the  present  study  indicates  a  specific  difficulty  on 
the  part  of  the  deaf  signers  with  serial-order  recall. 


It  is  important  to  realize  that  difficulties  deaf  individuals  may  have 
with  serial-recall  tasks  need  not  Interfere  with  their  primary-language 
abilities  in  ASL  because  of  ASL's  emphasis  on  simultaneous  production  of 
linguistic  units.  But  serial-recall  performance  may  become  a  problem  when 
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deaf  individuals  learn  a  spoken  language.  English,  even  more  than  some  other 
spoken  languages,  relies  heavily  on  word  order  in  syntactic  structuring.  Not 
surprisingly,  deaf  children  have  difficulty  in  learning  to  read  and  write  the 
complex  syntactic  structures  of  English,  which  place  a  heavy  load  on  memory 
for  ordered  units  (Russell,  Quigley,  &  Power,  1976),  and  deaf  individuals  usu¬ 
ally  do  not  read  as  well  as  their  hearing  peers  (Bornstein  &  Roy,  1973: 
Karchmer,  Milone,  &  Wolk,  1979).  If  we  are  to  improve  our  methods  for  teach¬ 
ing  deaf  ).ersons  to  read  and  write,  it  is  crucial  that  we  gain  more  insight 
into  the  sti ategies  that  deaf  individuals  bring  to  bear  when  remembering  En¬ 
glish  letters,  words,  and  sentences,  and  the  ways  in  which  deafness  affects 
the  perception  and  memory  for  sequential  flow  of  linguistic  information. 
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Footnote 

‘Sequential  structuring  does,  of  course,  play  a  role  in  ASL ,  much  as 
simultaneous  structuring  does  in  speech.  The  essential  difference  is  in  the 
extent  to  which  sequential  structure  or  parallel  structure  is  part  of  the  ab¬ 
stract  organization  of  the  language.  Studdert -Kennedy  and  Lane  (1980)  suggest 
that  speech  draws  on  parallel  organization  (coarticulation,  for  example)  to 
implement  an  abstract  sequential  linguistic  structure,  while  ASL  draws  on 
sequential  organization  of  its  gestures  to  implement  an  abstract  parallel 
linguistic  structure.  For  example,  in  ASL  the  formation  of  a  sign's  handshape 
may  precede  the  start  of  its  movement.  Clearly,  there  is  also  a  sequential 
component  in  ASL  syntax. 


DID  ORTHOGRAPHIES  EVOLVE?* 


Ignatius  G.  Mattinglyt 


Abstract.  According  to  Gelb  (1963),  writing  has  "evolved"  from  pic¬ 
ture  writing  to  logography  to  syllabic  writing  to  alphabetic  writ¬ 
ing.  It  is  argued  here  that  this  widely  accepted  theory  of  ortho¬ 
graphic  evolution  does  not  really  fit  the  historical  facts  very 
well,  and  that  the  variety  of  orthographies  is  better  explained  on 
linguistic  grounds.  Orthographies  have  to  be  productive,  and  they 
can  manage  this  only  by  providing  devices  for  transcribing  the 
possible  words  in  the  lexicon.  The  very  limited  number  of  different 
ways  in  which  this  is  accomplished  in  different  orthographies  is 
accounted  for  by  the  structural  peculiarities  of  the  languages  that 
the  orthographies  transcribe. 

It  is  generally  believed  by  linguists,  psychologists,  psycholinguists  and 
educators  that  writing  has  "evolved."  First  there  was  picture  writing,  then 
came  logographies ,  then  syllabaries,  and  finally,  the  alphabet.  At  each  of 
these  stages  of  development,  writing  became  more  efficient,  because  a  smaller 
inventory  of  signs  was  required  to  do  the  job.  The  alphabet  is  the  culmina¬ 
tion  of  this  evolutionary  process,  and  its  nearly  universal  triumph  over  less 
efficient  orthographies  has  been  well  deserved. 

The  evolutionary  view  of  writing  probably  originated  during  the  nine¬ 
teenth  century,  when  most  of  the  decipherments  that  led  to  our  present  knowl¬ 
edge  of  ancient  writing  systems  took  place,  and  theories  of  cultural  evolu¬ 
tion,  inspired  by  the  theory  of  biological  evolution,  were  in  vogue.  The 
evolutionary  view  can  be  found  in  one  form  or  another  in  many  of  the  standard 
accounts  of  the  history  of  writing.  Thus  Jensen  (1970): 

In  the  broader  history  of  writing  we  can  see  then  certain  evolution¬ 
ary  tendencies  emerging.  Above  all  it  is  governed  by  the  law  of 
least  resistance,  according  to  which  every  change  must  in  the  normal 
way  run  from  the  more  difficult  to  the  more  easy,  from  the  more 
complicated  to  the  more  simple;  we  find,  furthermore,  in  keeping 
with  the  general  development  of  civilization,  an  increasing  abstrac¬ 
tion,  a  certain  assimilation  of  the  form  to  the  self-increasing 
intellectuality  of  the  content,  (p.  22) 

(Cf.  also  Pedersen,  1962,  chap.  VI.)  And  the  evolutionary  view  has  been 
elaborated  into  a  theory  by  Gelb,  whose  A  Study  of  Writing  (1963)  most  of  us 
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who  are  interested  in  the  psychology  of  reading  turn  to  for  enlightenment 
about  the  natural  history  of  writing. 

Gelb  says  that  "writing  had  its  origin  in  simple  pictures"  (p.  190),  ad¬ 
vanced  to  "semasiography"  (that  is,  picture  writing),  and  then  to  "phonogra¬ 
phy,"  which  comprehends  word-syllabic,  syllabic,  and  alphabetic  writing 
(p.  191).  The  development  of  writing  is  said  to  be  "unidirectional"  (p.  200): 

What  this  principle  means  in  the  history  of  writing  is  that  in 
reaching  its  ultimate  development  writing,  whatever  its  forerunners 
may  be,  must  pass  through  the  stages  of  logography,  syllabography , 
and  alphabetography  in  this,  and  no  other,  order.  Therefore,  no 
writing  can  start  with  a  syllabic  or  alphabetic  stage  unless  it  is 
borrowed,  directly  or  indirectly,  from  a  system  which  has  gone 
through  all  the  previous  stages.  A  system  of  writing  can  naturally 
stop  at  one  stage  without  developing  farther.  Thus  a  number  of 
writings  stopped  at  the  logographic  or  syllabic  stage,  (p.  201) 

Thus,  Just  as  biological  evolution  explains  the  variety  of  natural  species, 
orthographic  evolution  is  said  to  explain  the  variety  of  orthographic  species. 

What  I  wish  to  do  here  is  to  reconsider  the  theory  of  orthographic  evolu¬ 
tion.  I  will  argue  that  the  evolution  of  writing  has  been  more  apparent  than 
real,  and  that  the  variety  of  orthographic  species  is  better  understood  from  a 
standpoint  more  linguistic  than  Gelb  adopts.  The  alphabet,  I  will  suggest,  is 
not  necessarily  the  best  way  to  write  all  languages.  For  the  evidence  that 
leads  to  these  conclusions,  I  rely  mainly  on  the  remarkable  erudition  of  Gelb 
himself. 

Is  this  a  matter  of  more  than  marginal  concern  for  the  psychology  of 
reading  and  spelling?  I  suggest  that  it  may  be,  for  the  evolutionary  view  is 
echoed  by  psychologists  concerned  with  the  reading  process  (Crowder,  1982, 
p.  1il8;  Henderson,  1982,  p.  7),  and  the  supposed  evolution  of  writing  is  some¬ 
times  taken  to  reflect  psychological  facts  and  even  to  suggest  teaching  strat¬ 
egies.  Citing  Gelb  (1963),  Gleltraan  and  Rozin  (1977)  say. 

...each  orthography  arose  as  a  gradual  refinement  and  generalization  of 
resources  already  implicitly  available  in  its  predecessors,  as  though  the 
early  scripts  formed  the  necessary  conceptual  building  blocks  required 
for  further  development ... .On  these  grounds,  one  can  build  a  plausibility 
case  (though  only  that)  for  organizing  reading  instruction  in  terras  of  a 
similar  accumulation  of  conceptions:  perhaps  ontogeny  recapitulates 
cultural  evolution,  (p.  8) 

Let  us  begin  with  the  claim  that  logography  evolved  from  picture  writing. 
There  are  seven  ancient  traditions  of  logographic  writing:  the  Mesopotamian, 
Proto-Elamite,  Proto-Indic,  Sino-Japanese,  Egyptian,  Cretan,  and  Hittite. 
Decipherment  has  not  progressed  very  far  in  the  cases  of  Proto-Elamite, 
Proto-Indic,  and  the  early  Cretan  writing,  but  in  the  case  of  the  other  logo- 
graphic  traditions  there  is  evidence  that  the  signs  were  at  first  iconic 
(Gelb,  1963,  chap.  Ill),  only  later  becoming  arbitrary  and  non-lconic.  The 
obvious  explanation  for  this  development  is  that  while  iconic  signs  were  suit¬ 
able  for  monumental  inscriptions,  hieratic,  commercial,  and  literary  uses  re¬ 
quired  signs  that  could  be  rapidly  written  rather  than  slowly  drawn.  There 
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was  thus  an  evolution  from  iconic  to  non-iconic  writing.  But  regardless  of 
their  graphic  form,  the  signs  were  from  the  beginning  logograms:  they  stood 
for  words  (or  more  correctly,  morphemes),  not,  as  is  sometimes  said, 
"concepts"  or  "meanings."  An  iconic  sign  designated  a  particular  word  by  sug¬ 
gesting  some  aspect  of  its  meaning,  but  the  meaning  of  the  logographic  text 
did  not  depend  on  these  pictorial  hints,  but  on  the  selection  and  ordering  of 
the  words,  just  as  it  does  in  spoken  and  written  language  in  general. 
Non-iconic  signs,  arbitrarily  associated  with  words,  served  the  purpose  equal¬ 
ly  well. 

Picture  writing,  on  the  other  hand,  is  non-linguistic.  The  term  is  a 
convenient  cover  label  for  a  fascinating  miscellany  of  assorted  artifacts  from 
preliterate  societies:  rock-drawings  warning  of  danger  nearby,  pictorial 
"letters,"  narratives  and  proverbs,  tribal  and  commercial  identification 
symbols,  calendar  systems,  and  so  on  (Gelb,  1963,  chap.  II). 

In  what  sense  can  logography  be  said  to  have  evolved  from  picture  writ¬ 
ing?  The  claim  would  have  some  substance  if  it  could  be  shown  that  the  signs 
of  some  logography  were  borrowed  from  or  paralleled  those  of  a  particular 
tradition  of  picture  writing,  but  there  appears  to  be  no  example  of  this  sort 
in  any  of  the  logographic  traditions.  The  Mesopotamian  Sumerians  used  both 
cylinder  seals  and  logographic  writing  on  commercial  identification  tags,  but 
there  is  no  relationship  between  the  seals  and  the  writing  (Gelb,  1963, 
p.  65).  If  cultural  evolution  means  anything,  it  must  imply  some  kind  of 
structural  development:  thus  the  computer  can  reasonably  be  said  to  have 
evolved  from  the  loom.  But  linguistic  writing  merely  took  over  the 
communicative  functions  of  picture  writing,  as  the  internal  combustion  engine 
took  over  the  locomotive  functions  of  the  horse;  it  did  not,  in  any  interest¬ 
ing  sense,  evolve  from  picture  writing. 

The  second  part  of  Gelb's  theory  is  that  syllabaries  evolved  from 
logographies .  This  claim  implies  that  within  a  particular  orthographic  tradi¬ 
tion,  there  is  a  period  of  strictly  logographic  writing,  then,  perhaps,  a 
transitional  period,  and  then  a  period  of  strictly  syllabic  writing.  But  what 
we  actually  find,  in  the  Mesopotamian,  Hittite  and  Sino-Japanese  traditions 
(Egyptian  will  be  discussed  shortly)  is  just  the  transitional  period. 

The  writing  in  these  traditions  is  what  Gelb  aptly  calls  "word-syllabic" 
writing,  in  which  logograms  and  syllabary  signs  supplement  each  other.  Thus, 
in  Sumerian  and  in  Japanese  writing,  the  syllable  signs  are  used  regularly  to 
write  inflectional  morphemes  and  can  also  be  used  to  write  base  morphemes. 
Alternatively,  a  base  morpheme  can  be  written  with  a  logogram,  and  in  this 
case,  a  supplementary  syllable  sign  is  sometimes  used  to  indicate  the  phono¬ 
logical  form  of  the  morpheme.  In  Chinese  writing,  some  of  the  characters  are 
simple  logograms,  but  most  of  them  consist  of  two  component  signs:  the  "radi¬ 
cal,"  one  of  2n  signs  that  serve  as  semantic  classifiers,  and  the  "phonetic 
complement,"  a  sign  that  in  isolation  has  a  phonological  value  similar  or 
identical  to  that  of  the  compound  character.  The  compound  character  for 
/kUj/,  blind,  for  instance,  is  composed  of  the  simple  signs  for  /ku,/,  drum 
and  /mu*/,  eye  (Jensen,  1970,  p.  170).*  Since  the  phonetic  complements  have 
logographic  values  of  their  own,  and  there  are  in  general  quite  a  few  phonetic 
complements  for  a  particular  syllable  (10  for  /li*/  for  example;  Wieger, 
1927),  it  might  seem  a  bit  eccentric  to  regard  Chinese  writing  as  systemati¬ 
cally  syllabic,  rather  than  simply  as  a  case  of  massive  phonetic  transfer. 
But  the  fact  that  a  common  error  in  the  writing  of  Chinese  is  the  use  of  an 
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incorrect  but  phonologically  accurate  phonetic  complement  (H.-B.  Lin,  personal 
communication)  attests  to  the  psychological  reality  of  the  syllabary  system. 

In  all  these  word-syllabic  orthographies,  the  syllable  signs  clearly 
derive  from  logograms.  Thus  the  syllable  sign  for  /gal/  in  Sumerian  derives 
from  the  logogram  for  /gal/,  great  (Gelb,  1963,  pp.  110-111);  one  of  the 
phonetic  complements  for  /kUj/  in  Chinese,  as  we  have  seen,  derives  from  the 
logogram  for  /kUj/,  drum;  and  the  Japanese  kana  for  /mo/  derives  from  the 
character  for  /mo/,  hair ,  borrowed  from  Chinese  /mao^/,  hair  (Jensen,  1970, 
p.  201).  But  is  derivation  necessarily  to  be  equated  with  evolution?  Gelb 
himself  makes  it  quite  clear  that  there  is  no  period  in  any  of  these  tradi¬ 
tions  during  which  the  writing  was  strictly  logographic;  syllable  signs  occur 
in  the  earliest  specimens  (Gelb,  1963,  pp.  67,  83,  85).  Nor  did  any  of  these 
traditions  lead  eventually  to  a  strict  syllabary,  though  some  of  the  later 
Mesopotamian  systems  came  fairly  close  (p.  165). 

The  Cretan  tradition  is  perhaps  the  one  case  that  supports  the  claim. 
Whether  there  was  a  strictly  logographic  stage  cannot  be  determined  until  the 
early  Minoan  scripts  are  deciphered,  but  the  strictly  syllabic  Cypriote 
orthography  appears  to  have  developed  from  the  earlier  word-syllabic  stage 
represented  by  Cretan  Linear  B  (Gelb,  1963,  p.  154). 

Finally,  Gelb's  theory  claims  that  alphabetic  writing  evolves  from  sylla¬ 
bic  writing.  But  this  part  of  the  theory  depends  crucially  on  Gelb's  particu¬ 
lar  interpretation  of  the  structure  of  the  Egyptian  and  West  Semitic 
orthographies,  and  on  his  presumption  that  the  latter  derive  from  the  former. 

In  the  Afro-Asiatic  family  of  languages,  to  which  both  Egyptian  and 
Semitic  belong,  the  base  morphemes  are,  in  general,  simply  consonantal  pat¬ 
terns,  for  example,  Egyptian  n-f-r,  lute;  p-r,  house;  and  Semitic  k-t-b,  to 
write;  m-l-k,  ^  rule.  In  actual  words,  vowels  are  morphologically  inserted 
and,  together  with  prefixes  and  suffixes,  distinguish  the  various  forms  de¬ 
rived  from  the  base.  Thus  the  base  k-t-b  yields  in  Hebrew  [ka'tav],  ^  wrote; 
[jix'tov],  he  will  write;  [jik'atev],  1^  will  be  inscribed;  [mix'tav],  letter; 
[ktu'ba]  marriage ,  and  many  other  forms. 

Egyptian  writing  is  a  mixture,  often  redundant,  of  logograms  and  signs 
for  consonants  and  for  sequences  of  two  consonants.  These  consonantal  and 
biconsonantal  signs  are  derived  from  the  logograms  by  phonetization.  Thus  the 
sign  for  d-t,  snake,  is  used  for  the  consonant  /d/,  and  the  sign  for  /w-r/, 
swallow,  is  used  for  the  consonantal  sequence  /w-r/  in  writing  /w-r-d/,  ^  be 
weary  (Jensen,  1970,  p.  60).  There  are  no  obviously  syllabic  signs.  Vowels 
are  not  ordinarily  indicated,  but  in  special  cases,  such  as  foreign  proper 
names,  the  signs  for  the  consonants  /?/,  /j/,  /w/  are  used  for  vowels  /a/, 
/i/,  /u/,  respectively.  This  assignment  of  consonantal  signs  to  vowels  is  not 
arbitrary.  /j/  is  homorganic  with  /i/  and  /w/  with  /u/.  While  /?/  is  not 
homorganic  with  /a/,  it  is  nevertheless  phonologically  reasonable  to 
transcribe  the  low  back  vowel  with  the  sign  for  the  glottal  stop,  the  lowest 
and  most  back  consonant.  As  with  Sumerian  and  Chinese,  there  appears  to  be  no 
historical  period  during  which  the  writing  is  strictly  logographic;  the 
consonantal  signs  are  there  from  the  first  (Gelb,  1963,  p.  74). 

Ancient  Semitic  writing  consists  simply  of  signs  that  ordinarily  stand 
for  single  consonants:  thus  Hebrew  [ka'tav]  is  written  ktb,  and  [mix'tav], 
mktb.  But  as  with  Egyptian,  consonantal  signs  are  used,  when  necessary,  to 
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indicate  vowels;  the  signs  for  /*?/,  /J/,  /w/,  aleph,  yod  and  waw,  could  indi¬ 
cate  /a/;  /i/  or  /e/;  and  /u/  or  /o/,  respectively.  This  device  was  used  not 
only  for  proper  names:  [da'wid],  David,  being  written  dwjd ,  but  also  to  avoid 
ambiguity  in  other  words,  [jix'tov]  being  written  j k twb  to  distinguish  it 
from  [jik'atev],  written  Jktb. 

Pace  Gleitman  and  Rozin,  it  was  surely  not  the  case  that  the  West  Semites 
didn't  "notice"  the  vowels  in  their  language  (1977,  p.  19):  when  it  was  im¬ 
portant  to  write  the  vowels,  they  wrote  them.  On  the  contrary,  what  is  espe¬ 
cially  significant  about  the  Afro-Asiatic  languages  is  that  their  morphologi¬ 
cal  structure  must  have  fostered  awareness  of  segmental  structure  to  a  far 
greater  degree  than  in  the  case  of  Indo-European  languages.  As  I  have  argued 
elsewhere,  such  "linguistic  awareness"  is  not  automatic  and  is  essential  for 
alphabetic  reading  and  writing  (Liberman,  Liberman,  Mattingly,  &  Shankweiler, 
1980;  Mattingly,  1972) . 

The  reason  that  both  Egyptian  and  Semitic  could  be  written  without  con¬ 
sistent  indication  of  vowels  is  that,  in  general,  the  vowels  carried  only 
inflectional  information.  Since  word-order  is  relatively  fixed,  this  informa¬ 
tion  is  for  the  most  part  redundant.  On  the  other  hand,  in  Greek  and  in 
Indo-European  languages  generally,  the  base  morphemes  include  vowels.  Thus, 
when  the  Phoenician  alphabet  was  adapted  to  Greek,  it  became  a  plene  alphabet: 
vowels  as  well  as  consonants  were  regularly  transcribed,  aleph,  yod,  and  waw 
being  used  for  /a/,  /i/,  and  /u/  as  before,  and  three  other  Phoenician 
consonantal  signs,  he,  /h/,  heth,  /h/,  and  ayin,  1^1,  for  /e/,  /e/  and  /o/, 
respectively. 

To  maintain  his  theory  of  orthographic  evolution,  Gelb  has  to  argue, 
since  there  are  no  preceding  West  Semitic  logographies  or  syllabaries,  that 
the  West  Semitic  scripts  derive  from  the  Egyptian.  And  since  he  denies  the 
direct  development  of  an  alphabet  from  a  logography,  he  has  to  argue  that  the 
Egyptian  consonantal  and  biconsonantal  signs  are  really  syllabic. 

In  asserting  the  derivation  of  the  West  Semitic  script  from  the  Egyptian, 
Gelb  very  properly  rejects  the  farfetched  attempts  of  other  scholars  to 
demonstrate  similarities  in  the  forms  of  the  signs  of  the  two  scripts.  His 
argument  relies  on  the  similarity  of  "inner  structure"  (p.  146),  that  is,  the 
use  of  a  limited  set  of  signs  to  express  consonants  but  not  (ordinarily)  vow¬ 
els.  But  this  argument  loses  what  force  it  might  have  in  view  of  the  fact 
that  it  is  the  same  peculiarity  in  morphological  structure  that  made  it  possi¬ 
ble  for  both  languages  to  be  written  in  this  way.  Gelb  might  have  adduced  a 
further  similarity  of  inner  structure:  when  vowels  did  have  to  be  written, 
the  signs  for  the  same  three  consonants,  /?/,  /j/,  and  /w/,  were  used  to  write 
the  same  three  vowels,  /a/,  /!/,  and  /u/.  But  the  similarity  of  Egyptian  and 
West  Semitic  phonological  inventories  explains  this.  Since  both  had  the  con¬ 
sonants  /?/,  /j/,  /w/  phonologically  related  to  the  vowels  /a/,  /i/,  /u/, 
respectively,  the  signs  for  these  consonants  were  the  obvious  choices  to  write 
the  corresponding  vowels.  Though  the  possibility  cannot  be  ruled  out,  there 
is  no  need,  in  the  absence  of  other  evidence,  to  conclude  that  West  Semitic 
script  is  derived  from  Egyptian  script.  The  linguistic  similarity  of  the 
Egyptian  and  Semitic  languages  is  quite  sufficient  to  account  for  the 
similarity  of  the  two  scripts. 
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As  for  the  Egyptian  consonantal  signs,  Gelb's  proposal  is  that  each  of 
them  represents  a  set  of  syllables  or  disyllables  with  the  same  consonants  but 
varying  (or  zero)  vowels.  Thus  the  biconsonantal  sign  that  other  scholars 
transliterate  as  mn  or  m-n  is  transliterated  by  Gelb  m’^n’^ ,  , 
m  -^n  —  ;  ~  standing  for  whatever  vowel  is  required  in  context  (1963, 
pp.  77-78).  From  the  reader's  point  of  view,  this  might  seem  a  distinction 
without  a  difference,  but  for  Gelb  it  is  crucial: 

The  Egyptian  phonetic,  non-semantic  writing  cannot  be  consonantal, 
because  the  development  from  a  logographic  to  a  consonantal  writing, 
as  generally  accepted  by  Egyptologists,  is  unknown  and  unthinkable 
in  the  history  of  writing,  and  because  the  only  development  known 
and  attested  in  dozens  of  various  systems  is  that  from  a  logographic 
to  a  syllabic  writing,  (pp.  78-79;  original  in  italics) 

But,  obviously,  this  argument  is  entirely  circular;  only  the  theory  itself 
justifies  the  syllabic  interpretation.  One  might  have  supposed  that  the  ^est 
Semitic  scripts,  at  least,  could  be  allowed  to  be  alphabetic  without  damage  to 
the  theory,  but  to  concede  this  would  obviously  undermine  the  claim  of  inner 
structural  similarity  between  them  and  the  Egyptian  script.  Thus  the  West 
Semitic  script  must  be  syllabic,  too,  waw,  for  example,  being  transliterated 
wa,  wi ,  wu  (Gelb,  1963,  p.  1^8),  and  the  development  of  alphabetic  writing 
must  await  the  Greeks. 

This  claim  is  not  only  uncorroborated;  it  also  makes  it  much  more  diffi¬ 
cult  to  account  for  the  emergence  of  the  Greek  plene  alphabet.  If  the  Phoeni¬ 
cian  orthography  was  syllabic,  there  is  no  particular  reason  why  the  Greeks, 
any  more  than  other  Indo-Europeans,  should  have  become  aware  of  the  segmental 
character  of  their  language  when  they  borrowed  this  orthography.  We  should 
expect  to  find  them  using,  at  least  at  first,  a  patched-up  syllabary  like  that 
of  the  Persians.  But  if  it  is  recognized  that  the  West  Semites,  thanks  to  the 
peculiar  morphology  of  their  language,  had  already  arrived  at  the  alphabetic 
principle,  then  the  development  of  the  Greek  alphabet  from  the  Phoenician  al¬ 
phabet  can  be  seen  to  be  simply  a  matter  of  adding  two  more  vowel  signs  and 
using  them  consistently. 

If  we  do  not  accept  the  claim  for  the  development  of  West  Semitic  writing 
from  Egyptian  writing,  and  for  the  syllabic  nature  of  at  least  the  latter, 
then  Gelb's  theory  is  in  trouble,  for  it  would  seem  that,  insofar  as  deriva¬ 
tion  can  be  equated  with  evolution,  an  alphabet  can  evolve  from  a  logography 
without  an  Intervening  syllabic  stage,  as  in  the  case  of  Egyptian;  and  may 
even,  perhaps,  emerge  without  any  precursors,  as  in  the  case  of  West  Semitic; 
but  that  no  alphabets  have  developed  from  syllabic  or  word-syllabic  systems, 
for  apart  from  the  Ugaritic  cuneiform  alphabet,  of  unknown  origin  (Gelb,  1963f 
p.  129),  all  other  alphabets  are  derived  directly  or  indirectly,  from  the  West 
Semitic  consonantal  alphabets. 

The  theory  of  orthographic  evolution  cannot  be  correct,  for  logography 
cannot  be  shown  to  have  evolved  from  picture  writing  in  any  meaningful  sense; 
syllabaries  do  not  generally  develop  from  logographies ;  and  alphabets  do  not 
develop  from  syllabaries.  What  we  find  instead  are  either  logosyllabic  tradi¬ 
tions:  Mesopotamian,  Hittite,  Cretan,  and  Sino-Japanese;  or  alphabetic  tradi¬ 
tions:  Egyptian  and  West  Semitic.  We  can,  if  we  choose,  regard  as  evolution¬ 
ary  the  development  of  non-iconic  logograms  from  iconic  ones,  or  the  develop¬ 
ment  of  the  Greek  plene  alphabet  from  the  Phoenician  consonantal  alphabet,  but 
these  are  not  the  sorts  of  evolution  the  theory  calls  for. 
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But  without  the  theory,  how  can  we  account  Tor  the  variety  of 
orthographies?  Let  us  consider  this  question  from  a  rather  different  point  of 
view.  The  orthography  of  a  language  must  be  productive;  that  is,  it  must  en¬ 
able  the  user  to  write  any  of  the  infinite  number  of  possible  utterances  of 
the  language.  Because  there  are  many  levels  at  which  an  utterance  is  mentally 
represented  in  production  and  perception,  there  are,  in  principle,  many  possi¬ 
ble  for'ms  that  a  productive  orthography  might  take.  For  example,  any  utter¬ 
ance  of  a  particular  language  (in  fact,  any  utterance  of  any  language)  can  be 
written  in  a  general  system  of  phonetic  transcription.  If  such  a  transcrip¬ 
tion  were  used  as  an  orthography  for  all  languages,  any  literate  person  could 
read  aloud  in  any  language.  Or  one  could  imagine  an  orthography  that  would  be 
based  on  the  acoustic  properties  of  utterances  (cf.  the  "visible  speech"  of 
Potter,  Kopp,  &  Green,  19^7,  and  the  stylized  spectrographic  patterns  used  for 
speech  synthesis  by  rule  at  Haskins  Laboratories  by  Liberman,  Ingemann,  Lisk- 
er ,  Delattre,  &  Cooper,  1959);  such  an  orthography  would  include  just  the 
information  on  which  the  listener  to  spoken  language  relies.  Or  one  could 
imagine  an  orthography  based  on  the  semantic  representations  of  utterances 
(cf.  Katz  &  Fodor ,  1963),  if  indeed  such  representations  really  exist  (Fodor, 
Fodor ,  &  Garrett,  1975);  after  all,  it  is  the  meaning,  not  the  linguistic 
structure,  that  the  writer  really  wants  to  convey  to  the  reader.  But  it  is 
obvious  that  none  of  these  alternatives  would  do  for  a  practical  orthography, 
though  it  is  not  easy  to  say  exactly  why  (see  Mattingly,  1984,  for  some  specu¬ 
lations)  . 

There  is  in  fact  a  very  severe  limitation  on  orthographic  variety.  In 
practical  orthographies,  only  one  basic  principle  has  ever  been  used,  that  of 
transcribing  utterances  of  a  language  as  sequences  of  lexical  items,  that  is, 
words.  I  would  argue  that  all  known  orthographies  are  in  this  sense  lexical, 
varying  only  in  the  specific  ways  in  which  they  happen  to  transcribe  the 
words.  The  lexical  character  of  logographies  seems  obvious,  but  it  might  be 
objected  that  alphabetic  systems  are  essentially  transcribing  the  phonemes  of 
utterances,  and  only  incidentally  the  words.  With  a  well-behaved  orthography, 
like  that  of  Serbo-Croatian,  only  the  spaces  between  the  words  indicate  its 
specifically  lexical  character.  The  point  becomes  clearer  in  the  case  of  an 
eccentric  orthography,  like  that  of  English,  in  which  there  is  usually  more 
than  one  way  to  write  a  particular  sound.  Thus  English  [ay],  phonologically 
/i/,  can  be  written  -igh- ,  -y ,  -y( -)e ,  l(-)e,  -uy .  But  despite  this  variabil¬ 
ity,  there  is  but  one  way  of  writing  each  of  the  words  sight ,  try ,  lye,  dyne, 
lie,  lime ,  buy . 

A  lexical  orthography  can  only  be  productive  if  it  incorporates  a  system 
for  transcribing  all  the  words  in  the  languages.  There  is,  however,  no 
principle  that  can  specify  just  the  actual  words  of  a  language,  and  provide 
the  basis  for  such  a  system.  Thus  /?ayf/,  v.,  ^  gather  truffles  on  Wednesday 
could  perfectly  well  be  an  English  word;  its  absence  from  the  lexicon  is 
accidental.  Nor,  since  the  membership  of  the  lexicon,  though  finite  in  theo¬ 
ry,  is  indefinite  in  practice,  would  it  be  satisfactory  simply  to  list  all  the 
words  and  provide  an  arbitrary  sign  for  each.  Any  word  that  was  inadvertently 
omitted,  or  entered  the  language  after  the  list  was  compiled,  would  be 
unwriteable.  And  the  writer  who  could  not  remember  the  sign  for  a  word  that 
was  on  the  list  would  be  driven  to  paraphrase.  Thus  there  can  be  no  strict 
logographies,  for  a  strict  orthography  would  not  be  productive;  and  according¬ 
ly  no  such  stage  is  actually  found  in  Sumerian,  Egyptian,  Hittite,  or  Chinese. 
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There  is,  however,  a  way  to  specify  all  possible  words  in  a  language. 
The  phonetics  and  phonotactics  of  a  language  determine  the  set  of  phonological 
forms  that  qualify  for  membership  in  its  lexicon.  Thus,  while  /5ayf/  could  be 
a  word  in  English,  and  /kaet/  really  is  one,  /ADc/  and  /stwoyg/  could  not  be. 
By  exploiting  the  phonological  structure  of  the  language,  that  is,  by  some 
form  of  phonetization,  an  orthography  insures  that  any  possible  word  can  be 
transcribed.  This  does  not  mean  that  a  writer  will  always  know  the  standard 
way  to  write  a  particular  word,  or  that  the  reader  will  always  know  what  word 
is  transcribed  by  a  particular  orthographic  form.  It  does  not  preclude  a 
particular  word's  being  standardly  transcribed  in  some  exceptional  or  arbi¬ 
trary  way,  e.g.,  one.  What  it  does  mean  is  that  if  /5ayf/  should  enter  the 
English  language,  there  will  be  at  least  one,  in  fact  several,  ways  to  write 
it;  that  the  writer  who  cannot  recall  the  standard  spelling  of  cat  can  at 
least  write  kat,  and  that  the  reader  confronted  with  a  word  unfamiliar  in  its 
written  form  will  have  a  basis  for  guessing  what  the  word  is. 

Although  lexical  items  have  syntactic  and  semantic  as  well  as  phonologi¬ 
cal  properties,  only  the  last  allow  the  specification  of  the  set  of  possible 
words  of  a  language.  Syntactic  properties  are  not  sufficient  to  specify  dif¬ 
ferent  words  uniquely,  and  a  principled  characterization  of  word  meaning  has 
thus  far  eluded  the  efforts  of  linguistic  semanticists  (Fodor,  1977,  chap.  5). 
As  we  have  seen,  however,  semantic  properties  can  nonetheless  play  a  useful 
auxiliary  role  in  orthographies. 

Every  orthography,  then,  achieves  productivity  by  incorporating  some  sys¬ 
tem  for  transcribing  phonologically  the  possible  words  of  the  language.  Since 
the  only  relevant  phonological  units  are  syllables  and  phonemes,  there  are  re¬ 
ally  only  two  ways  to  do  this:  the  syllabic  way  and  the  alphabetic  way,  and 
we  have  seen  that  all  orthographies  make  use  either  of  the  one  or  the  other. 
But  why  must  there  be  even  two  ways?  Why  are  not  all  orthographies  plene 
alphabets?  The  answer  is  that,  to  a  large  extent,  the  morphological  and 
phonological  structure  of  a  language  defines  the  orthographic  options.  There 
are  some  languages  for  which  a  plene  alphabet  would  be  cumbersome  and  redun¬ 
dant,  and  others  for  which  there  is  no  really  satisfactory  method  of  phoneti¬ 
zation.  Moreover,  the  alphabetic  option  becomes  an  obvious  one  only  under 
rather  special  linguistic  circumstances. 

A  Semitic  language,  unless  it  has  borrowed  heavily  from  a  non-Semitic 
language,  has  no  need  of  a  plene  alphabet.  Since  lexical  items  are  consonan¬ 
tal  patterns,  the  vowels  carrying  only  inflectional  information,  an  extremely 
parsimonious  system  of  phonetization  is  possible,  as  the  West  Semitic 
orthographies  demonstrate.  Under  similar  linguistic  circumstances,  Egyptian 
writing  was  able  to  achieve  productivity  in  much  the  same  way.  The  extensive 
and  often  redundant  use  of  logograms  does  not  alter  the  fact  that  the 
uniconsonantal  and  biconsonanta 1  signs  are  the  true  basis  of  this  orthography. 

Because  of  their  restricted  syllable  structure,  Sumerian,  Chinese  and 
Japanese  are  less  orthographically  amenable.  Japanese  has  only  7*1  phonotacti- 
cally  possible  syllables  (or  more  exactly,  moras).  Chinese  has  about  1200 
possible  syllables,  but  by  no  means  all  of  them  are  actually  used.  Sumerian 
appears  to  have  been  similarly  restricted.  Restricted  syllable  structure 
surely  promotes  awareness  of  syllables,  and  in  these  cases  a  syllabary  might 
seem  to  be  the  obvious  phonetization  device.  But  the  morphological  conse¬ 
quence  of  restricted  syllable  structure  unfortunately ,  is  pervasive  homophony, 
exacerbated  when,  as  in  Chinese  and  Sumerian,  the  base  morphemes  are  mostly 
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monosyllabic.  For  example,  there  are  38  different  Chinese  words  with  the 
phonological  form  /li^/  (Wieger,  1927).  Under  these  circumstances,  a  strict 
syllabary  is  hardly  practical,  for  it  would  give  rise  to  pervasive  homography, 
far  less  tolerable  in  writing,  because  of  the  lack  of  prosodic  information  to 
help  specify  syntactic  structure,  than  pervasive  homophony  in  speech.  For 
these  languages,  a  word-syllabic  system,  in  which  the  ambiguity  of  syllable 
signs  is  reduced  with  the  help  of  logograms,  is  a  reasonable,  if  not  highly 
efficient  solution.  Alphabetic  writing  would  be  no  improvement.  To  replace 
the  syllabic  signs  in  Chinese  and  Japanese  writing  by  alphabetic  ones  would  do 
nothing  to  reduce  homography,  and  to  use  only  an  alphabet  to  write  these 
languages,  convenient  though  it  might  be  for  printers,  would  be  disastrous  for 
readers . 

For  many  other  languages,  a  plene  alphabet  is  the  most  efficient  system 
of  phonetization.  But  the  alphabetic  principle  is  not  an  obvious  one.  It  did 
not  occur  to  the  Hittites,  who  used  a  word-syllabic  system  even  though  they 
did  not  have  a  homophony  problem  and  could  have  used  an  alphabet.  It  occurred 
to  the  Egyptians  and  the  West  Semites  only  because  the  morphology  of  their  pe¬ 
culiar  character  of  the  languages  made  them  aware  of  phonological  segments. 
It  is  certainly  owing  entirely  to  the  West  Semitic  example  that  alphabetic 
writing  is  now  so  widespread. 

It  would,  however,  be  pressing  the  point  too  far  to  say  that  variations 
in  linguistic  structure  account  for  all  orthographic  variety.  Non-linguistic 
factors  assuredly  play  a  role.  The  Akkadians,  for  example,  spoke  a  Semitic 
language  and  would  certainly  have  been  well  advised  to  use  a  consonantal  al¬ 
phabet.  But  being  impressed  by  the  culture  of  the  Sumerians,  they  adopted  the 
Sumerian  orthography  and  made  writing  unnecessarily  complicated  for  themselves 
and  their  Mesopotamian  successors  (Jensen,  1970,  p.  94).  Greek  speakers  on 
the  island  of  Crete  used  a  word-syllabic  system.  Linear  B,  no  doubt  influenced 
by  the  example  set  by  the  speakers  of  the  unknown  Minoan  language  written  in 
Linear  A  (Gelb,  1963,  p.  91  ff.).  The  bewildering  complexities  of  the 

Japanese  kanji,  borrowed  from  the  Chinese,  have  a  similar  historical  explana¬ 
tion  (Martin,  1972). 

To  summarize,  Gelb's  widely  accepted  theory  of  orthographic  evolution 
must  be  rejected.  Orthography  has  no  relationship  to  picture-language,  and 
there  is  no  sequential  development  from  logography  to  syllabary  to  alphabet. 
The  forms  that  orthographies  have  taken  are  constrained  by  the  requirement 
that  they  must  be  productive,  and  must  transcribe  lexical  items.  The  limited 
variety  of  orthographies  can  be  explained  largely  on  linguistic  grounds. 
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THE  DEVELOPMENT  OF  CHILDREN'S  SENSITIVITY  TO  FACTORS  INFLUENCING  VOWEL  READING 
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Abstract.  To  disambiguate  vowel  assignment  to  a  vowel  digraph  in  a 
word,  readers  must  take  into  account  aspects  of  the  word  context  be¬ 
yond  the  vowel  digraph  units  themselves.  The  present  study  examined 
the  development  of  young  readers'  use  of  this  context  in  two  experi¬ 
ments.  In  the  first  experiment,  first-,  third-,  and  fifth-grade 
children  were  required  to  read  aloud  high-  and  low-frequency  words 
containing  vowel  digraph  units  with  variant  and  invariant  pronuncia¬ 
tions.  Words  containing  vowel  digraph  units  with  variant  pronuncia¬ 
tions  were  further  categorized  by  the  uniformity  of  pronunciation  of 
the  vowel  digraph-final  consonant  unit  as  it  appeared  in  real  words 
(i.e. ,  the  orthographic  neighborhood  consistency). 

While  word  reading  accuracy  of  all  groups  was  enhanced  by  word 
frequency,  only  the  third  and  fifth  graders  demonstrated  sensitivity 
to  variation  in  pronunciation  of  the  vowel  digraph  unit.  For  these 
children,  low-frequency  words  containing  vowel  digraph  units  with 
invariant  pronunciations  were  read  with  accuracy  comparable  to  that 
obtained  for  the  high-frequency  words.  In  contrast,  low-frequency 
words  containing  vowel  digraphs  with  variant  pronunciations  were 
still  a  significant  source  of  error  for  the  older  readers,  but 
chiefly  when  they  came  from  inconsistent  orthographic  neighborhoods. 

In  a  second  experiment,  pseudoword  stimulus  items  were  used  to 
examine  further  the  effect  of  the  orthographic  neighborhood  on  vowel 
pronunciation.  The  influence  of  the  vowel  digraph-final  consonant 
unit  in  determining  pronunciations  was  again  indicated  by  limited 
variability  in  pronunciations  of  pseudowords  ending  in  particular 
vowel  digraph-final  consonant  units  from  consistent  orthographic 
neighborhoods.  Where  there  was  variability  in  pronunciation,  the 
initial  consonant -vowel  digraph  structure  appeared  to  be  largely 
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responsible.  Both  experiments  support  the  hypothesis  that  with 
reading  experience,  children  Identify  the  systematic  relationship 
between  pronunciation  and  orthographic  structure  and  utilize  that 
knowledge  in  the  pronunciation  of  unfamiliar  words. 

Analysis  of  the  errors  made  by  children  as  they  acquire  skill  in  word 
reading  has  provided  some  clues  to  the  problems  beginning  readers  encounter  in 
identifying  words.  The  well-documented  finding  that,  in  English,  vowel  mis¬ 
readings  occur  with  greater  frequency  than  consonant  misreadings  (Fowler, 
Liberman,  &  Shankweiler,  1977;  Shankweiler  &  Liberman,  1972;  Weber,  1970)  sug¬ 
gests  that  beginners  in  English  experience  particular  difficulty  in  associat¬ 
ing  a  given  orthographic  vowel  unit  with  its  appropriate  pronunciation. 

A  number  of  explanations  have  been  proposed  to  account  for  the  difference 
in  difficulty  between  vowels  and  consonants  (Fowler,  Shankweiler,  &  Liberman, 
1979;  Shankweiler  &  Liberman,  1976).  One  explanation  emphasizes  the  differ¬ 
ences  in  the  linguistic  properties  of  vowels  and  consonants  in  speech  produc¬ 
tion  and  perception,  noting  that  vowels  are  more  fluid  and  generally  less 
categorically  defined  than  consonants  (Liberman,  Cooper,  Shankweiler,  &  Stud- 
dert-Kennedy ,  1967).  Another  explanation  turns  on  the  difference  between 

vowel  and  consonant  orthography.  The  preponderance  of  errors  on  vowels  has 
been  attributed  to  the  fact  that  the  same  vowel  may  be  spelled  differently  in 
different  words.  Consonants,  on  the  other  hand,  have  a  more  nearly  one-to-one 
correspondence  between  orthographic  unit  and  phonological  segment.  The  conso¬ 
nant  letters,  with  few  exceptions,  cue  the  same  phonological  segments  wherever 
they  occur,  whereas  the  letters  that  represent  vowels  frequently  have  multiple 
phonological  referents  (Venezky,  1967).  Further  support  for  the  role  of  the 
orthography,  rather  than  the  differences  in  vowel  and  consonant  perception,  in 
accounting  for  the  vowel  error  pattern  is  reported  by  Lukatela  and  Turvey 
(1980).  In  their  examination  of  word  reading  errors  in  Serbo-Croatian,  an 
orthography  that  Includes  a  simple  vowel  set  but  a  more  complex  consonant  set, 
phoneme  substitutions  on  medial  vowel  segments  were  less  frequent  than 
substitutions  on  initial  or  final  consonant  segments. 

In  view  of  the  complexity  of  the  English  vowel  orthography,  it  is  hardly 
surprising  that  there  are  more  vowel  errors  than  consonant  errors  in  reading 
English  words.  In  order  to  disambiguate  the  vowel  pronunciation,  readers  must 
take  into  account  aspects  of  the  word  contexts  that  are  represented  by  the 
letters  surrounding  the  vowels.  Beginners'  errors  show  that  they  have  not  yet 
learned  to  do  this,  but  use  Instead  grapheme-phoneme  correspondences  for  sin¬ 
gle  vowel  letters  (Fowler  et  al.,  1979).  With  age  and  experience,  children 
narrow  the  range  of  vowel  renderings  with  greater  and  greater  precision,  tak¬ 
ing  more  account  of  the  surrounding  letter  context  (Fowler  et  al.,  1979). 

In  English,  these  surrounding  letter  contexts  differ  in  the  extent  to 
which  they  constrain  the  selection  of  the  appropriate  vowel.  The  context  may 
be  tightly  constrained,  as  in  the  tense  or  long  pronunciation  for  orthographic 
vowel  units  appearing  in  the  context  of  the  sllent-e  marker.  Or  it  may  be 
loosely  constrained  in  a  vowel  digraph  that  may  have  several  appropriate  real¬ 
izations  within  a  particular  context.  For  example,  the  vowel  digraph  ou  in 
the  context  of  gh  may  be  correctly  rendered  as  /au/  in  bough,  /A/  in  tough, 
/o/  in  thought,  /u/  in  through  or  /o/  in  though.  In  the  Fowler  et  al.  stud¬ 
ies,  although  the  stimuli  included  a  wide  range  of  contextual  constraints,  the 
possibly  differing  effects  among  them  were  not  considered. 
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Attempts  to  construct  a  model  for  predicting  adults'  pronunciations  of 
pseudowords  containing  vowel  digraph  units  (Johnson  &  Venezky,  1976;  Ryder  & 
Pearson,  1980)  have  suggested  that  the  vowel  pronunciation  could  be  influenced 
either  by  the  frequency  of  occurrence  of  that  unit  without  regard  to  the  con¬ 
text,  that  is,  without  regard  to  the  effect  of  the  final  consonant  or,  alter¬ 
natively,  by  the  context  provided  by  the  final  consonant.  Results  of  those 
investigations  support  a  model  predicting  that  adult  pronunciation  is  highly 
determined  by  frequency  of  orthographic  patterns,  but  the  functional  unit  is 
hypothesized  to  be  the  vowel  digraph-final  consonant  structure. 


M 
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Skilled  adult  readers  have  in  fact  been  shown  to  be  sensitive  to  the  con¬ 
sistency  or  inconsistency  of  the  pronunciation  of  medial  vowel-final  letter 
units  (ulushko,  1979).  Glushko  has  proposed  that,  in  the  course  of  reading  a 
word,  an  entire  neighborhood  of  similarly  structured  words  and  their 
pronunciations  is  automatically  activated  in  memory.  Glushko's  "neighborhood" 
includes  all  monosyllabic  words  in  the  reader's  lexicon  that  share  the  same 
medial  vowel  letters  in  combination  with  the  same  letter  units  in  word  final 
position.  Rhyming  words  such  as  seam,  beam,  and  team,  sharing  both  the  medial 
vowel-final  letter  unit  and  a  uniform  pronunciation,  would  thus  constitute  a 
consistent  orthographic  neighborhood;  whereas  the  words  beat,  threat,  and 
great,  although  sharing  the  medial  vowel-final  letter  unit,  fail  to  share  a 
uniform  pronunciation,  and  thus  would  be  classified  as  constituting  an  incon¬ 
sistent  orthographic  neighborhood.  Glushko’s  adult  readers'  performance  was 
influenced  by  the  consistency  or  inconsistency  in  orthographic  neighborhoods 
as  evidenced  by  more  rapid  reading  and  more  limited  variation  in  pronunciation 
of  words  and  pseudowords  from  consistent  orthographic  neighborhoods  (i.e. , 
words  of  similar  structure  sharing  a  uniform  pronunciation).  It  was  also 
indicated  by  a  greater  latency  of  response  and  significant  variation  in 
pronunciation  of  words  from  inconsistent  orthographic  neighborhoods  (i.e., 
words  of  similar  structure  that  fail  to  share  a  uniform  pronunciation). 


The  vowel  digraph  unit  in  many  words  may  be  ambiguous  unless  the  reader 
can  exploit  additional  cues  from  the  other  letters  in  the  word.  The  broader 
context,  as  for  example,  the  final  consonant,  may  supply  such  cues.  Whether 
or  not  it  does  could  depend  on  whether  word  items  from  an  orthographic 
neighborhood  for  that  vowel  digraph-final  consonant  unit  share  a  consistent 
pronunciation.  Thus,  the  final  consonant  might  be  used  to  disambiguate  the 
vowel  digraph,  but  its  use  would  involve  a  complex  context-sensitive  opera¬ 
tion. 

A  study  examining  this  skill  in  second-,  fourth-,  and  sixth-grade  chil¬ 
dren  (Johnson,  1970)  found  that  the  factor  most  likely  to  influence  children's 
selections  was  also  the  frequency  of  occurrence  of  a  particular  pronunciation 
for  a  given  unit,  and  further,  that  with  increasing  grade  level,  children's 
responses  more  closely  reflected  the  pronunciations  of  those  units  as  they  ap¬ 
pear  in  real  words.  Though  mention  is  made  of  some  additional  effects  of  the 
final  consonant  context  and  the  position  of  the  vowel  digraph  unit  within  the 
word,  the  study  was  not  designed  to  investigate  the  development  of  the  influ¬ 
ence  of  context  on  children's  selections  as  a  result  of  reading  experience. 
Nor  did  it  examine  the  effects  of  the  frequency  of  occurrence  of  the  vowel  di¬ 
graph-final  consonant  structure  and  the  consistency  of  pronunciation  of  that 
structure  in  real  words. 
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To  date,  there  has  been  no  systematic  study  of  the  development  of  chil¬ 
dren's  use  of  the  final  consonant  context  in  disambiguating  vowel  assignment 
to  vowel  digraph  units  and  their  sensitivity  to  orthographic  neighborhood  con- 
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sistency.  An  examination  of  these  effects  with  children  may  provide  insight 
into  the  development  of  children's  awareness  of  the  very  complex  relationship 
between  the  orthography  and  the  phonology.  In  addition,  it  would  also  assist 
us  in  understanding  how  normally  developing  readers  use  the  reading  vocabulary 
they  have  mastered  to  develop  strategies  to  identify  unfamiliar  words. 

In  order  to  explore  these  questions,  two  experiments  were  conducted.  In 
the  first  experiment,  development  of  children's  understanding  of  vowel  digraph 
pronunciation  was  the  focus.  First-,  third-,  and  fifth-grade  children  were 
required  to  read  aloud  high-  and  low-frequency  words  containing  vowel  digraph 
units  with  variant  and  invariant  pronunciations.  For  each  grade,  an  examina¬ 
tion  of  error  rate  and  of  the  characteristics  of  errors  was  conducted  to  ex¬ 
plore  the  effects  of  word  frequency,  of  alternate  pronunciations  for  vowel  di¬ 
graph  units,  and  of  consistency  of  orthographic  neighborhood  on  word  reading 
accuracy.  The  second  experiment  investigated  other  influences  on  vowel  di¬ 
graph  reading  using  pseudowords  containing  vowel  digraph  units  that  have  vari¬ 
ant  pronunciations  in  words.  By  eliminating  the  factor  of  word  familiarity, 
pronunciation  preferences  for  vowel  digraph  units,  as  well  as  factors 
influencing  those  pronunciations,  could  be  studied  and  the  results  compared 
with  those  obtained  on  the  real  word  reading  task. 

General  Method 


Subjects 

The  subjects  in  the  first  experiment  were  children  from  the  first-, 
third-,  and  fifth-grade  classes  of  a  suburban  public  school  system  in  Connect¬ 
icut.  Following  a  review  of  teacher  ratings  for  reading  achievement  for  the 
first  and  third  graders,  and  teacher  ratings  and  group  reading  achievement 
tests  scores  for  the  fifth  graders,  a  pool  of  subjects,  all  average  or  above 
average  readers,  was  Identified.  The  final  population  consisted  of  90  stu¬ 
dents,  30  from  each  grade  level.  The  subjects  participating  in  the  second 
experiment  were  the  30  third-grade  children  who  had  participated  in  Experiment 
1.  All  subjects  selected  were  native  English  speakers  with  no  known  hearing 
or  vision  impairments. 

Procedure 

The  children  were  tested  individually  in  two  30-min  sessions.  During  the 
first  session,  the  experimental  word  reading  task  was  presented.  The  words 
were  typed  in  lower  case  primary  type  on  x  6"  file  cards  secured  in  a  ring 
binder.  The  stimuli  were  presented  in  random  order  with  20  filler  words, 
which  were  single  syllable  items  selected  from  the  reading  subtest  of  the  Wide 
Range  Achievement  Test  (Jastak,  Bijou,  &  Jastak,  1978).  These  filler  words 
were  Included  in  order  that  the  randomization  satisfy  the  constraint  that 
words  with  the  same  vowel  sound  not  precede  one  another,  thus  minimizing 
possible  priming  effects.  Subjects  were  Instructed  to  read  each  word  orally 
and  then  to  turn  to  the  following  card.  Approximately  two  weeks  after  the 
initial  session,  a  second  session  was  held  for  the  third-grade  children  during 
which  the  experimental  pseudoword  reading  task  was  presented.  Subjects  were 
Informed  that  these  words  were  nonsense  or  "pretend"  words  and  that  they 
should  not  attempt  to  make  real  words  out  of  the  items.  They  were  instructed 
to  read  each  word  orally  and  to  turn  to  the  following  card  after  reading  each 
word.  All  pronunciations  were  recorded  on  tape  for  later  transcription  and 
analysis. 

284 


Zinna  et  al.:  Sensitivity  to  Factors  Influencing  Vowel  Reading 


Experiment  1 

Materials 

Two  lists  of  monosyllabic  real  words,  including  72  items  in  all,  were  de¬ 
veloped.  One  list,  as  displayed  in  Table  1,  included  words  containing  vowel 

digraph  units  with  invariant  phonological  correspondences,  oa,  a_i,  and 
ew.  The  words  in  the  other  list,  as  displayed  in  Table  2,  included  words  con¬ 
taining  units  with  variant  correspondences,  ea,  ou,  ow,  and  oo.  Words 

were  selected  to  vary  in  two  respects:  frequency  and  variability  of 

pronunciation  of  the  vowel  digraph  unit.  Frequency  was  determined  by  the 
occurrence  of  the  words  in  reading  material  at  the  third-grade  level  as 

indicated  in  the  American  Heritage  Word  Frequency  Listings  (Carroll,  Davies,  & 
Richman,  1971).  Classification  according  to  variant  or  invariant  pronuncia¬ 
tion  was  based  on  the  pronunciations  reported  in  a  thorough  listing  (Fischer, 
1979)  of  monosyllabic  English  words  containing  vowel  digraphs.  Both  word  fre¬ 
quency  and  pronunciation  variability  were  systematically  controlled  in  both 
stimuli  lists. 

In  addition,  as  indicated  in  Table  2,  for  each  monosyllabic  word  contain¬ 
ing  a  vowel  digraph  unit  with  a  variant  pronunciation,  the  word's  orthographic 
neighborhood  was  determined  from  the  Fischer  set  in  the  manner  of  Glushko 
(1979).  This  determination  was  made  for  both  high-  and  low-frequency  words. 
Each  word  with  a  vowel  digraph-final  consonant  unit  that  is  always  pronounced 
the  same  way  in  all  monosyllabic  words  sharing  that  structure,  was  considered 
to  have  a  consistent  orthographic  neighborhood.  In  contrast,  each  word  with  a 
vowel  digraph-final  consonant  unit  that  is  pronounced  differently  in  at  least 
one  other  monosyllabic  word  sharing  that  structure  was  considered  to  have  an 
inconsistent  orthographic  neighborhood. 

Results  and  Discussion 

Because  the  variance  in  performance  was  substantially  greater  for  the 
first  graders  than  for  the  third  and  fifth  graders,  a  separate  analysis  was 
carried  out  for  each  grade  level  group.  Mean  percentages  of  correct  re¬ 
sponses,  possible  pronunciation  responses,  and  error  responses  were  calculated 
for  each  grade  on  each  word  category.  These  data  appear  in  Table  3.  The  data 
for  each  grade  were  subjected  to  two  separate  factorial  analysis  of  variance 
procedures.  The  first  analysis  examined  factors  of  word  frequency  and 
pronunciation  variability  for  the  vowel  digraph  unit  for  the  entire  set  of 
stimuli.  In  the  second  analysis  the  factors  of  word  frequency  and  consistency 
or  inconsistency  of  the  orthographic  neighborhood  were  examined  for  the  vari¬ 
ant  pronunciation  set  of  words. 

Effects  of  Frequency  and  Vowel  Digraph  Pronunciation 

First  graders.  The  analysis  of  the  first  graders'  data  revealed,  as 
expected,  a  significant  main  effect  for  word  frequency,  F(1,29)  =  *15. 89,  £  < 
.0001.  As  illustrated  in  Table  3,  these  children  correctly  identified  65f  and 
63X  of  the  high-frequency  words  containing  vowel  digraph  units  with  variant 
and  invariant  pronunciations,  respectively.  Thus,  it  appears  likely  that  the 
first  graders  employed  a  holistic  word  reading  strategy.  In  contrast,  identi¬ 
fication  was  correct  for  only  50?  of  the  low-frequency  words  containing  vowel 
digraph  units  with  invariant  pronunciations  and  *<3?  of  the  low-frequency  words 
containing  vowel  digraph  units  with  variant  pronunciations.  Thus,  while  these 
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Table  1 

Real- Word  Stimulus  Items  with  Invariant  Pronunciations  of  the  Vowel  Digraphs 
(Experiment  1) 


High  Frequency 


Low  Frequencv 


green 

street 

road 

coal 

soil 

Join 

paint 

main 

drew 

flew 


sleek 

breed 

oat 

boast 

toil 

Joint 

ail 

trait 

dew 

slew 


Table  2 

Real-Word  Stimulus  Items  with  Variant  Pronunciations  of  the  Vowel  Digraphs 
(Experiment  1 ) 


Consistent  Orthographic 
Neighborhood 


Inconsistent  Orthographic 
Neighborhood 


High 

Frequencv 


Low 

Frequencv 


High 

Frequency 


Low 

Frequencv 


beach 

ream 

read 

tread 

clean 

dean 

speak 

steak 

break 

teak 

head 

plead 

young 

mount 

mouth 

youth 

found 

spout 

touch 

slouch 

group 

vouch 

proud 

soul 

tried 

fried 

owl 

flown 

piece 

niece 

how 

tow 

pie 

lied 

bowl 

Jowl 

field 

shield 

low 

pow 

soon 

croon 

foot 

loot 

room 

sloop 

food 

hood 

good 

mood 

Ah 
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first-grade  readers  correctly  identified  nearly  two-thirds  of  the  high-fre¬ 
quency  words  containing  vowel  digraph  units  with  invariant  pronunciations, 
they  did  not  generalize  that  knowledge  in  assigning  the  correct  pronunciation 
to  identical  vowel  digraph  units  with  invariant  pronunciations  embedded  in  the 
less  familiar,  low-frequency  words. 

Further  analysis  suggests  that  the  first-grade  readers  were  nonetheless 
beginning  to  acquire  an  awareness  of  alternate  pronunciations  for  vowel  di¬ 
graph  units  with  variant  pronunciations.  In  reading  high-frequency  words  con¬ 
taining  vowel  digraph  units  with  variant  pronunciations,  first  graders,  as 
noted  above,  correctly  identified  65?  of  the  words;  however,  58$  of  their  er¬ 
ror  responses  consisted  of  substitutions  of  possible  alternate  pronunciations 
for  that  vowel  digraph  unit.  Error  data  obtained  from  their  reading  of 
low-frequency  words  containing  vowel  digraph  units  with  variant  pronunciations 
offer  corroborative  evidence  for  this  finding.  Although  the  overall  error 
rate  for  reading  low-frequency  words  containing  vowel  digraph  units  with  vari¬ 
ant  pronunciations  was  substantially  greater  than  that  obtained  for  the 
high-frequency  words,  53$  of  these  errors  (again  greater  than  one-half  of  the 
total)  consisted  of  substitutions  of  possible  alternate  pronunciations  for  the 
vowel  digraph  unit. 

Third  graders.  As  was  the  case  for  first  graders,  analysis  of 
third-grade  data  again  revealed  a  significant  main  effect  for  frequency, 
F(1,29)  =  55.46,  £  <  .0001.  In  addition,  a  significant  main  effect  for 
pronunciation  for  the  vowel  digraph  unit,  not  present  in  the  analysis  of  the 
first-grade  data,  was  obtained  with  the  third  graders,  F(1,29)  =  59.98,  £  < 

.0001.  As  illustrated  on  the  left  in  Figure  1,  an  interaction  between  word 
frequency  and  pronunciation  for  the  vowel  digraph  unit  was  obtained,  F(1 ,29)  • 

23.54,  £  <  .0001. 

Like  the  first  graders,  the  third-grade  readers  read  high-frequency  words 
containing  vowel  digraph  units  with  variant  and  invariant  pronunciations 
equally  well,  though  with  greater  accuracy  than  the  first  graders,  correctly 
identifying  96$  and  98$  of  words  in  these  categories,  respectively.  In  con¬ 
trast  to  the  first  graders,  the  third  graders  read  low-frequency  words  with 
invariant  pronunciations  for  the  vowel  digraph  unit  with  accuracy  comparable 
to  that  obtained  for  the  high-frequency  words.  They  correctly  identified  92$ 
of  the  low-frequency  words  of  that  orthographic  type,  suggesting  that  they  had 
been  successful  in  identifying  the  systematic  relationship  between  pronuncia¬ 
tion  and  orthographic  structure  among  the  words  in  their  reading  vocabulary. 
Less  dependent  on  previous  knowledge  of  specific  words,  the  third  graders 
demonstrated  skill  in  generalizing  knowledge  of  proper  pronunciations  of 
invariant  vowel  digraph  units  when  those  units  appeared  in  the  context  of  un¬ 
familiar,  low-frequency  words. 

In  contrast  to  this  performance  on  the  invariant  units,  the  third  graders 
were  able  to  read  accurately  only  79$  of  the  low-frequency  words  containing 
vowel  digraph  units  with  variant  pronunciations.  Nonetheless,  their  overall 
error  rate  in  this  category  (21$)  was  substantially  lower  than  that  of  the 
first  graders  (57$).  However,  like  the  first-grade  pattern,  a  majority  of 
their  errors  (82$)  consisted  of  substitutions  of  possible  alternate  pronuncia¬ 
tions  for  the  vowel  digraph  unit.  As  illustrated  in  Table  3,  while  the  error 
rate  declined  from  the  first  to  the  third  grade,  the  ratio  of  substitutions  of 
possible  alternate  pronuciations  to  errors  increased.  Once  again,  the 
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Table  3 

Frequencies  and  Percentages  of  Correct  and  Incorrect  Responses  for  Real  Words 
Containing  Variant  and  Invariant  Vowel  Digraph  Units  (Experiment  1) 


Variant 

Unit 

Grade  1 

Grade 

3 

Grade 

5 

Freq. 

% 

Freq. 

% 

Freq. 

% 

High-Frequency  Words 

Total  Correct 

509 

65 

751 

96 

766 

98 

Errors 

Possible  Pronunciations 

158 

20 

25 

3 

14 

2 

Impossible  Pronunciations 

113 

15 

4 

1 

0 

0 

Low-Frequency  Words 

Total  Correct 

332 

43 

618 

79 

712 

91 

Errors 

Possible  Pronunciations 

235 

30 

133 

17 

57 

7 

Impossible  Pronunciations 

213 

27 

29  4 

Invariant  Unit 

11 

1 

High-Frequency  Words 

Total  Correct 

189 

63 

294 

98 

300 

100 

Errors 

111 

37 

6 

2 

0 

0 

Low-Frequency  Words 

Total  Correct 

149 

50 

275 

92 

296 

99 

Errors 

151 

50 

25 

8 

4 

1 

100 


Grade  5 


80 


Low 

Frequency 


High 

Frequency 


Vowel  Digraph  Pronunciation 
»  invariant 

— - -  variant 


Low  High 

Frequency  Frequency 


Figure  1.  Performance  of  third  and  fifth  graders  on  reading  low-frequency  and 
high-frequency  words,  plotted  in  mean  percent  correct. 
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third-grade  readers  demonstrated  skill  in  generalizing  knowledge  of  pronuncia¬ 
tions  for  vowel  digraph  units  to  unfamiliar,  low-frequency  words. 

Fifth  graders.  The  main  effects  for  word  frequency  and  variant  versus 
invariant  pronunciation  for  the  vowel  digraph  unit  were  again  revealed  in  the 
analysis  of  the  fifth-grade  data,  F(1,29)  =  38.40,  £  <  .0001,  and  F(l,29) 
59.39,  £  <  .0001,  respectively.  As  illustrated  on  the  right  in  Figure  1,  an 

interaction  between  frequency  and  pronunciation  for  the  vowel  digraph  unit  was 
again  obtained,  F(1 ,29)  =  26.51,  £  <  .0001.  Though  their  performance  was  more 
accurate  overall,  the  pattern  of  the  fifth  graders  was  similar  in  one  respect 
to  that  of  both  earlier  grades.  That  is,  they  read  high-frequency  words  con¬ 
taining  vowel  digraph  units  with  variant  and  invariant  pronunciations  equally 
well,  correctly  identifying  98?  and  100?  of  the  words  of  these  categories, 
respectively. 

As  observed  previously  with  the  third  graders,  the  fifth  graders  success¬ 
fully  identified  the  systematic  relationship  between  pronunciation  and  ortho¬ 
graphic  structure.  Thus,  they  were  able  to  generalize  that  knowledge  to  the 
identification  of  words  of  lower  frequency  containing  these  invariant  units, 
correctly  identifying  99?  of  the  words  of  this  category.  As  was  the  case  with 
the  third  graders,  the  fifth  graders'  reading  of  low-frequency  words  contain¬ 
ing  vowel  digraph  units  with  variant  pronunciations  was  poorer  than  their 
reading  of  high-frequency  words  of  that  type:  91?  of  the  words  of  this  cate¬ 
gory  were  correctly  identified.  Most  of  their  errors  (87?)  consisted  of 
substitutions  of  possible  alternate  pronunciations  for  the  vowel  digraph  unit 
embedded  within  these  words,  a  slightly  greater  percentage  of  such  substitu¬ 
tions  than  in  the  third  grade  (82?). 

Summary.  The  analysis  confirms  the  expectation  that  children's  accuracy 
in  word  reading  would  be  favorably  enhanced  by  high  word  frequency,  regardless 
of  the  number  of  alternate  pronunciations  for  the  vowel  digraph  unit  contained 
within  these  words.  In  addition,  the  highly  accurate  performance  of  the  third 
and  fifth  graders  in  reading  low-frequency  words  containing  vowel  digraph 
units  with  Invariant  pronunciations  supports  the  hypothesis  that  with  reading 
experience,  children  Identify  the  systematic  relationship  between  pronuncia¬ 
tion  and  orthographic  structure  and  utilize  that  knowledge  in  the  pronuncia¬ 
tion  of  unfamiliar  words.  Finally,  the  increase  in  proportion  of  substitu¬ 
tions  of  possible  alternate  pronunciations  among  the  errors,  which  increased 
with  increasing  grade  level,  provides  further  evidence  that  as  children  devel¬ 
op  reading  skill  they  identify  the  systematic  relationship  between  pronuncia¬ 
tion  and  orthographic  structure. 

Effects  of  Frequency  and  Orthographic  Neighborhood  Consistency 

A  second  analysis  was  conducted  to  examine  the  possibility  that  the  error 
rate  on  categories  of  words  that  contained  vowel  digraph  units  with  variant 
pronunciations  was  affected  by  the  consistency  of  the  orthographic  neighbor¬ 
hood  of  individual  words.  Mean  percentages  of  correct  responses  were 
calculated  for  each  grade  level  group  on  each  word  category.  These  data  ap¬ 
pear  in  Table  4. 

First  graders.  For  the  first  graders,  the  analysis  revealed  a  signif¬ 
icant  main  effect  for  frequency,  F(1,29)  =  75.12,  £  <  .0001.  As  indicated  in 
Table  4,  they  correctly  identified  62?  and  68?  of  the  high-frequency  words 
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Table 

Mean  Percentage  of  Correct  Responses  for  High-  and  Low-Frequency  Words  Con¬ 
taining  Variant  Vowel  Digraph  Units  from  Consistent  and  Inconsistent  Ortho¬ 
graphic  Neighborhoods  (Experiment  1) 


Orthographic  Neighborhoods 
Consistent  Inconsistent 


Grade 

1 

3 

5 

1 

3 

5 

Variant  Unit 

High-Frequency  Words 
?  Correct 

62 

98 

99 

68 

95 

97 

Low-Frequency  Words 
?  Correct 

4H 

91 

96 

42 

72 

89 

from  consistent  and  inconsistent-orthographic  neighborhoods,  respectively.  In 
contrast,  they  correctly  identified  only  and  k2%  of  the  low-frequency 

words  from  consistent  and  inconsistent  orthographic  neighborhoods,  respective¬ 
ly.  Once  again,  word  frequency  was  the  most  predictive  index  of  word  reading 
accuracy. 

Third  graders.  Analysis  of  the  third  grade  data  also  revealed  a  signif¬ 
icant  main  effect  for  word  frequency,  F(1,29)  »  76.79,  £  <  .0001.  However,  a 
significant  main  effect  for  orthographic  neighborhood  consistency,  not  found 
in  the  analysis  of  the  first-grade  data,  was  also  obtained,  F(1,29)  =  88.87,  £ 
<  .0001.  As  Illustrated  on  the  left  of  Figure  2,  a  significant  interaction 
occurred  between  word  frequency  and  orthographic  neighborhood  consistency, 
F(1,29)  =  21.12,  £  <  .0001.  Like  the  first  graders,  the  third-grade  readers 
read  high-frequency  words  from  consistent  and  inconsistent  orthographic 
neighborhoods  equally  well,  though  with  greater  accuracy  than  the  first 
graders,  correctly  Identifying  98?  and  95?  of  words  from  these  categories, 
respectively.  When  low-frequency  words  were  presented,  however,  in  contrast 
to  the  first  graders’  error  pattern,  those  words  from  consistent  orthographic 
neighborhoods  were  read  with  accuracy  comparable  to  that  obtained  for  the 
high-frequency  words.  The  third  graders  correctly  identified  91?  of  the 
low-frequency  words  from  consistent  orthographic  neighborhoods,  in  contrast  to 
correct  identification  of  only  72?  of  the  low-frequency  words  from  inconsist¬ 
ent  orthographic  neighborhoods.  This  result  suggests  that  the  third  graders, 
but  not  the  first  graders,  have  developed  a  reading  vocabulary  sufficient  to 
provide  a  data  base  from  which  to  determine  the  relations  between  orthographic 
structure  and  pronunciation. 
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100  r 


o 

Ui 

CE 

OC 

O 

o 

I- 

z 

UI 

o 

OC 

Ui 

a 

z 

< 

UI 

2 


90 


80 


70 


1 


Grade 


Low 


High 


Frequency  Frequency 


100 


90 


Grade  5 


80  h 


Orthographic  Neighborhood 


0  Low 

Frequency 


consistent 

inconsistent 


High 

Frequency 


Figure  2.  Performance  of  third  and  fifth  graders  on  reading  low-frequency  and 
high-frequency  words  with  variant  vowel  digraph  units,  plotted  in 
mean  percent  correct. 


Fifth  graders.  Main  effects  for  word  frequency  and  orthographic 
neighborhood  consistency  were  once  again  found  in  the  analysis  of  the 
fifth-grade  data,  F(1,29)  -  31  .‘^7 .  £  <  .0001  ,  and  F(1  ,29)  =  33.29,  £  <  .0001, 

respectively.  As  illustrated  on  the  right  in  Figure  2,  a  significant  interac¬ 
tion  between  word  frequency  and  orthographic  neighborhood  consistency  was 
again  obtained,  F(1,29)  =  9.6^,  £  <  .0042.  Though  more  accurate  than  the 

first  and  third  graders,  the  fifth  graders  also  read  high-frequency  words  from 
consistent  and  inconsistent  orthographic  neighborhoods  equally  well,  correctly 
identifying  99?  and  97?  of  words  of  these  categories,  respectively.  Like  the 
third  graders,  the  fifth  graders,  when  presented  with  low-frequency  words  from 
consistent  and  inconsistent  orthographic  neighborhoods,  read  words  from  con¬ 
sistent  neighborhoods  with  accuracy  close  to  that  obtained  for  the  high-fre¬ 
quency  words.  They  correctly  identified  96?  of  the  low-frequency  words  from 
consistent  orthographic  neighborhoods,  as  contrasted  with  correct  identifica¬ 
tion  of  89?  of  the  low-frequency  words  from  inconsistent  orthographic 
neighborhoods.  Once  again,  support  is  provided  for  the  contention  that  the 
analysis  of  interword  relations  and  awareness  of  consistencies  and 
inconsistencies  between  orthographic  structure  and  pronunciation,  in  this  case 
the  vowel  digraph-final  consonant  structure,  provide  the  reader  with  the 
knowledge  necessary  to  pronounce  an  unfamiliar  word  correctly. 
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Experiment  2 

The  results  of  Experiment  1  provide  evidence  that  older  readers'  accuracy 
and  error  rate  in  reading  real  words  containing  vowel  digf’aph  units  with  vari¬ 
ant  pronunciations  were  influenced  by  the  consistency  of  pronunciation  of  oth¬ 
er  words  sharing  the  particular  vowel  digraph-final  consonant  unit.  To  exam¬ 
ine  this  effect  further  and  to  begin  exploring  the  effect  of  the  initial  con¬ 
sonant-vowel  digraph  unit  on  pronunciation  selection,  a  second  experiment  was 
conducted.  In  this  experiment,  the  third-grade  children  who  had  participated 
in  the  first  experiment  were  asked  to  read  monosyllabic  pseudowords  containing 
vowel  digraph  units  with  variant  pronunciations.  By  eliminating  the  possibil¬ 
ity  of  word  familiarity,  it  was  anticipated  that  factors  influencing  reading 
would  be  more  unequivocally  revealed. 

Materials 


A  list  of  60  monosyllabic  pseudowords  was  developed  that  contained  vowel 
digraph  units  v/ith  variant  pronunciations,  oo;  ou,  and  i^.  Each 

pseudoword  consisted  of  initial  and  final  segments  that  iriight  appear  in  real 
words.  The  initial  consonant-vowel  digraph  segment  and  the  vowel  digraph-final 
consonant  segment  in  each  of  the  pseudowords  represented  a  legitimate  sequence 
in  English  phonology.  However,  vowel  digraph  segments  in  the  pseudowords 
might  have  different  pronunciations  in  different  real  word  contexts.  For 
example,  the  ou  unit  in  the  pseudoword  moung  might  be  rendered  like  the  ^  in 
mouth  or  the  ou  in  young.  For  pseudowords  constructed  in  this  manner,  each 
item  was  reviewed  to  determine  the  consistency  of  pronunciation  among  monosyl¬ 
labic  real  words  sharing  the  vowel  digraph-final  consonant  unit.  Of  the  60 
items,  36  pseudowords  were  determined  to  have  cohsistent  orthographic 
neighborhoods,  as  evidehced  by  the  uniformity  of  pronunciation  among  monosyl¬ 
labic  real  words  sharing  the  particular  vowel  digraph-final  consonant  struc¬ 
ture  (Fischer,  1979).  The  remaining  2^1  items  were  determined  to  have  incon¬ 
sistent  orthographic  neighborhoods,  as  evidenced  by  the  lack  of  uniformity  of 
pronunciation  among  monosyllabic  real  words  sharing  the  particular  vowel  di¬ 
graph-final  consonant  structure  (Fischer,  1979).  The  final  pseudoword  lists 
are  included  in  Tables  5  and  6. 


Results  and  Discussion 

The  pronunciation  preferences  of  the  30  third  graders  for  reading  each  of 
the  pseudowords  are  listed  as  percentages  in  Tables  5  and  6.  Vowel  di¬ 
graph-final  consonant  units,  which  were  determined  to  have  consistent  ortho¬ 
graphic  neighborhoods  because  of  their  uniform  pronunciation  in  monosyllabic 
real  words,  are  listed  in  Table  5.  Items  determined  to  have  inconsistent 
orthographic  neighborhoods,  based  upon  the  lack  of  such  uniformity,  appear  in 
Table  6. 


Influence  of  the  Vowel  Digraph-Final  Consonant  Unit 


It  is  evident  from  Tables  5  and  6  that  pronunciations  for  pseudowords 
containing  the  vowel  digraph  units  to  and  ea  tended  to  vary  with  the  designa¬ 
tion  of  their  orthographic  neighborhood  as  consistent  or  inconsistent. 
Pseudoword  items  containing  the  units  -ooth,  -oom,  -oon,  and  -each,  -ean,  and 
-earn,  all  considered  to  have  consistent  orthographic  neighborhoods,  were  usu¬ 
ally  pronounced  as  /u/  for  the  former  and  /i/  for  the  latter.  These 
pronunciations  occurred  in  never  fewer  than  901  of  the  cases.  In  contrast, 
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the  units  -ool,  -ood,  -ook ,  and  -ead,  -eat,  and  -eak ,  all  considered  to  have 
inconsistent  orthographic  neighborhoods,  were  the  source  of  considerable 
variation  in  pronunciation.  Pseudowords  containing  the  oo  unit  received  the 
/u/  pronunciation  in  between  50$  and  97$  of  the  cases;  items  containing  the  ea 
unit  received  the  /i/  pronunciation  in  between  60$  and  97$  of  the  cases. 


Table  5 

Percentages  of  Total  Responses  to  Each  Item  from  Consistent  Neighborhoods  by 
Vowel  Digraph  Pronunciation  (Experiment  2) 

Consistent  Orthographic  Neighborhood 

Responses  Other  Responses 


/u/ 

/u/ 

or  Errors 

mooth 

90 

0 

10 

looth 

9‘i 

3 

3 

troom 

97 

3 

0 

poom 

9‘i 

3 

3 

shoon 

90 

3 

7 

smoon 

100 

0 

0 

woon 

93 

0 

7 

/i/ 

/ei  / 

/e/ 

meach 

9^1 

3 

0 

3 

slean 

97 

0 

3 

0 

chean 

97 

0 

0 

3 

team 

94 

3 

0 

3 

drief 

67 

33 

0 

tiece 

57 

40 

3 

criece 

60 

40 

0 

biece 

60 

37 

3 

f  iece 

70 

27 

3 

Certain  units  elicited  the  greatest  variation  in  pronunciation.  For 
example,  the  realizations  for  the  unit  oo  followed  by  l<  were  evenly  distribut¬ 
ed  between  /u/  and  /u/.  For  each  of  these  items,  the  initial  word  segments 
moo-  and  zoo-  were  words  likely  to  be  in  a  third  grade  child's  reading  vocabu¬ 
lary.  However,  the  highly  frequent  words  book  and  look ,  als>_  likely  to  be  in 
a  young  child's  reading  vocabulary,  provide  the  dominant  pronunciation  for  the 
unit  -ook  as  it  appears  in  monosyllabic  real  words.  These  factors,  in  addi¬ 
tion  to  these  items'  inconsistent  orthographic  neighborhood,  may  account  for 
the  pronunciation  alternation. 
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Table  6 

Percentages  of  Total  Responses  to  Each  Item  fron  Inconsistent  Orthographic 
Neighborhoods  by  Vowel  Digraph  Pronunciation  (Experiment  2) 


Inconsistent  Orthographic  Neighborhood 


Responses 
/u/  /u/ 


Othw  Responses 
or  Errors 


bool 

80 

7 

13 

smood 

97 

3 

0 

tood 

73 

17 

10 

zook 

54 

43 

3 

mook 

50 

47 

3 

/i/ 

/ei/ 

/e/ 

stread 

60 

0 

40 

0 

dead 

80 

0 

17 

3 

chead 

77 

0 

23 

0 

steat 

97 

0 

3 

0 

preat 

90 

3 

3 

3 

dreak 

70 

13 

10 

7 

heak 

94 

0 

3 

3 

treak 

94 

0 

3 

3 

/u/ 

/o/ 

/au  / 

/A/ 

touth 

30 

17 

43 

3 

7 

mouch 

0 

7 

80 

13 

0 
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Influence  of  the  Initial  Consonant-Vowel  Digraph  Unit 

In  contrast  to  the  even  distribution  of  pronunciation  selections  for  both 
pseudoword  items  ending  in  -ook  is  the  inconsistency  in  assignment  of 
pronunciation  to  several  other  pseudoword  items  containing  identical  vowel  di¬ 
graph-final  consonant  structures.  For  example,  similar  variation  in 
pronunciation  might  be  expected  for  the  three  items  ending  in  -ead,  a  unit 
with  an  Inconsistent  orthographic  neighborhood.  Instead,  the  ea  unit  in  the 
pseudoword  dead  was  rendered  as  /i/  80  %  of  the  time;  whereas  in  the  item 
stread,  it  was  similarly  rendered  only  60?  of  the  time.  It  seems  likely  that 
real  words  sharing  the  initial  consonant- vowel  digraph  structure  may  be  bias¬ 
ing  the  pronunciation  of  the  pseudoword,  but  a  final  determination  must  await 
further  study. 

As  indicated  in  Table  5,  the  consistency  of  pronunciation  expected  for 
the  ou  unit  in  pseudowords  ending  in  -oup,  -oud,  and  -ound,  considered  to  have 
consistent  neighborhoods  and  expected  to  be  rendered  as  /u/,  /au/,  and  /au/, 
respectively,  was  not  obtained.  It  may  be  that  the  paucity  of  words  ending  in 
those  structures  in  a  third-grade  child's  reading  vocabulary  reduced  the  sali- 
ency  of  the  vowel  digraph-final  consonant  unit,  allowing  the  initial  word  seg¬ 
ment  to  influence  pronunciation.  For  example,  the  pseudowords  proup  and  cloup 
were  expected  to  be  rendered  on  the  basis  of  the  reader's  knowledge  of  words 
such  as  soup  and  group.  Instead,  the  ou  unit  was  frequently  rendered  as  /au/. 
As  an  explanation  of  that  result,  we  would  suggest  that  words  such  as  proud 
and  cloud ,  which  share  the  exact  initial  consonant- vowel  digraph  unit  with 
proup  and  cloup,  may  have  been  activated  and  contributed  to  the  unexpected 
pronunciation. 

In  view  of  that  result,  the  apparent  saliency  of  the  /A/  pronunciation 
for  the  ou  unit  in  moung  is  particularly  notable.  Though  that  pronunciation 
occurs  in  English  only  in  the  single  word,  young,  the  ou  unit  embedded  in  the 
pseudoword  moung  received  the  /A/  pronunciation  70$  of  the  time,  despite 
membership  of  the  initial  segment  in  a  neighborhood  containing  mouth  and  moun¬ 
tain.  In  contrast,  the  other  pseudoword  item  containing  the  oung  unit, 
groung,  received  the  /^/  pronunciation  only  37$  of  the  time  and  the  pronuncia¬ 
tion  /au/  associated  with  the  Initial  segment  grou-,  50$  of  the  time. 

Mixed  Influence 

Additional  evidence  for  the  possibility  that  pronunciation  selections 
could  be  influenced  by  the  initial  consonant-vowel  digraph  unit  was  revealed 
in  the  analysis  of  the  ow  unit  in  pseudowords.  Any  pseudoword  containing  the 
ow  unit,  whether  it  ended  a  word  or  was  combined  with  "1"  as  in  -owl  or  "n"  as 
in  -own,  was  considered  to  have  an  inconsistent  orthographic  neighborhood. 
Pronunciations  of  pseudowords  containing  the  ow  unit  reflected  that  inconsis¬ 
tency,  with  the  exception  of  the  ow  in  the  items  cown  and  hown.  The  ow  unit 
in  these  words  was  rendered  as  /au/  in  100$  and  97$  of  the  cases,  respective¬ 
ly.  In  each  of  these  instances,  the  initial  word  segment  consisted  of  a 
morpheme,  the  pronunciation  of  which  was  not  overridden  by  the  pronunciation 
inconsistency  of  the  final  unit  -own.  In  addition,  words  likely  to  be  present 
in  a  third-grade  child's  reading  vocabulary,  down,  brown,  and  town,  provide 
identical  pronunciations  for  the  ow  unit  and  share  the  -own  structure,  prob¬ 
ably  accounting  for  the  consistent  rendering  of  these  items. 
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Pseudowords  containing  the  vowel  digraph  unit  were  all  expected  to  re¬ 
flect  their  consistent  orthographic  neighborhoods.  The  designation  of  consis¬ 
tency  was  based  as  always  on  the  uniformity  of  rendering  of  the  vowel  di- 
graph-final  consonant  unit  in  similarly  structured  real  words.  However,  in 
the  case  of  pseudowords  containing  the  j_e  unit,  this  detection  of  neighborhood 
consistency  required  that  the  reader  respond  to  the  affixation  of  plural  and 
past  tense  markers  as  a  signal  for  the  /ai/  pronunciation.  The  third-grade 
readers  in  this  study  were  able  to  Identify  the  ie  unit  in  the  pseudoword 
items  kie  and  nie  as  /ai/;  yet  their  pronunciations  for  similar  items  with  the 
plural  or  past  tense  marker  were  variable.  For  example,  the  ^  unit  in  the 
items  bries  and  fied  received  the  /ai/  pronunciation  in  between  50  and  70  per¬ 
cent  of  the  cases  only. 

The  ^  unit  in  pseudowords  ending  In  -ield,  -lece,  and  -ief  was  expected 
to  be  pronounced  as  /i/  on  the  basis  of  knowledge  of  such  words  as  field, 
piece,  and  chief.  A  review  of  the  responses  indicates  that  items  ending  n 
these  units  received  the  /i/  pronunciation  in  between  47?  and  70%  of  the 
cases.  Evidently,  pronunciation  preferences  are  being  influenced  by  experi¬ 
ence  or  instruction,  but  the  design  of  the  stimuli  did  not  allow  us  to  pin¬ 
point  the  source  of  the  variation  in  pronunciation  of  the  _i^  unit  in  that  con¬ 
text. 


Sutimary.  The  results  of  Experiment  2  provide  support  for  the  influence 
of  the  vowel  digraph-final  consonant  unit  in  determining  the  rendering  of  the 
vowel  in  English-like  pseudowords.  The  influence  of  this  unit  could  be  seen 
in  the  greater  uniformity  of  the  pronunciation  of  pseudowords  ending  in 
particular  vowel  digraph-final  consonant  units  from  consistent  orthographic 
neighborhoods.  In  instances  where  there  was  less  uniformity  in  pronunciation 
of  such  items,  the  influence  of  the  initial  segment  appears  to  account  for 
most  of  the  variability. 


General  Discussion 

Children's  acquisition  of  word  reading  skills  was  examined  with  particu¬ 
lar  emphasis  on  the  development  of  young  readers'  response  to  variant 
vs.  invariant  phonologic  associations  for  vowel  digraph  units,  the  use  of  the 
final  consonant  context  in  disambiguating  vowel  assignment  to  invariant  vowel 
digraph  units,  and  their  sensitivity  to  the  orthographic  neighborhood  consis¬ 
tency  of  that  vowel  digraph-final  consonant  structure. 

The  data  obtained  in  Experiment  1  indicate  that  the  word  reading  accuracy 
of  the  first-grade  children  was  strongly  affected  by  word  frequency,  but  not 
by  the  variation  in  pronunciation  of  the  vowel  digraph  unit.  This  finding 
supports  the  view  expressed  by  Gough  and  Hillinger  (1980)  that  initial 
acquisition  of  word  reading  skills  may  typically  be  accomplished  through  rote 
learning  with  the  result  that  frequently  encountered  words  are  usually  identi¬ 
fied  without  analysis  of  word  components. 

The  word  reading  accuracy  of  third  and  fifth  graders  was  also  affected  by 
word  frequency,  but  in  addition,  the  older  readers  read  low-frequency  words 
containing  vowel  digraph  units  with  invariant  pronunciations  with  accuracy 
comparable  to  that  obtained  for  the  high-frequency  words.  This  effect  is  con¬ 
sistent  with  results  of  earlier  studies  (Fowler  et  al.,  1979;  Venezky  &  John¬ 
son,  1973;  Venezky  &  Massaro,  1979)  demonstrating  children's  ability  to  gener¬ 
alize  knowledge  of  orthographic  patterns  beyond  the  words  in  which  they  were 
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originally  encountered.  In  contrast,  low-frequency  words  containing  vowel  di¬ 
graph  units  with  variant  pronunciations  were  a  significant  source  of  error 
even  for  the  older  readers. 

When  these  low-frequency  words  were  further  categorized  by  consistency  or 
inconsistency  of  their  orthographic  neighborhoods,  those  from  consistent 
orthographic  neighborhoods  were  read  by  the  third  and  fifth  graders  with  a 
level  of  accuracy  close  to  that  obtained  for  both  high-frequency  words  and 
those  of • low  frequency  that  contained  invariant  vowel  digraph  units.  For  the 
children  in  the  higher  grades,  only  the  low-frequency  words  containing  variant 
vowel  digraph  units  with  inconsistent  orthographic  neighborhoods  were  a 
substantial  source  of  error.  These  results  provide  support  for  a  model  in 
which  the  final  consonant  predicts  vowel  digraph  pronunciation  preferences 
(Johnson  &  Venezky,  1976;  Ryder  &  Pearson,  1980).  They  also  support  the  hy¬ 
pothesis  (Glushko,  1979)  that  the  ability  to  read  the  vowel  in  words  is 
affected  by  the  consistency  of  pronunciation  of  words  sharing  a  particular  me¬ 
dial  vowel-final  letter  unit.  Despite  some  exceptions,  these  findings  speak 
to  the  special  salience  of  the  vowel  digraph-final  consonant  unit  in 
disambiguating  vowel  pronunciation. 

In  the  second  experiment,  pseudoword  stimulus  items  were  used  to  allow  us 
to  explore  further  the  influence  of  the  neighboring  orthographic  segments  on 
vowel  pronunciation.  It  was  found  that  whereas  the  orthographic  neighborhood 
consistency  effect,  as  defined  for  medial  vowel-final  letter  units,  was  ob¬ 
tained  for  many  pseudoword  items,  the  pronunciation  of  others  was  not 
disambiguated  by  the  consistent  pronunciation  of  the  vowel  digraph-final  con¬ 
sonant  of  that  item.  This  result  was  observed  on  the  items  proup  and  cloup, 
in  which  the  ou  unit  was  frequently  pronounced  as  /au/,  despite  the  consisten¬ 
cy  of  pronunciation  evidenced  by  the  -oup  unit  as  it  appears  in  real  words. 
Many  of  these  exceptions  were  rationalized  by  considering  possible  interfer¬ 
ence  from  initial  consonant-vowel  digraph  occurrences  in  familiar  real  words. 
These  cases  suggest  that  in  future  work  it  will  be  desirable  to  expand  the 
concept  of  neighborhood  consistency  to  examine  influences  from  the  initial 
portion  of  the  word  as  well  as  of  the  final. 

One  possible  explanation  for  the  results  is  the  operation  of  a 
left-to-right  letter  string  parser  (Marcel,  1980).  Marcel  proposed  that  when 
a  word  or  pseudoword  is  presented  to  a  reader,  the  letter  string  is  segmented 
in  all  possible  ways.  Each  word  segment,  as  it  is  parsed,  automatically 
activates  the  pronunciations  of  that  unit  as  it  occurs  in  different  words. 
Thus,  for  the  young  reader  the  pronunciation  activated  for  the  word  segments 
prou-  and  clou-  may  result  from  the  words  proud  and  cloud  in  their  reading 
vocabularies.  Word  pronunciation  may  result  from  the  parsing  of  successive 
units  of  the  letter  string,  during  which  the  pronunciation  of  later  appearing 
segments  may  override  the  pronunciation  of  prior  segments  (Baron  &  Strawson, 
1976;  Marcel,  1980).  For  the  young  reader,  then,  it  may  be  that  the  strength 
of  the  association  between  the  unit  ou  and  the  /au/  pronunciation  was  too 
strong  to  be  overridden  by  the  pronunciation  of  -oup  as  it  appears  in  the 
words  soup  and  group. 

The  proposal  put  forth  by  Marcel  (1980)  also  explains  the  pronunciation 
of  the  ou  unit  in  the  item  moung  as  /A/.  According  to  that  explanation,  as  a 
child  attempts  pronunciation  of  the  pseudoword  moung,  the  initial  segment 
parsed  is  mou-,  the  ou  unit  likely  to  be  pronounced  as  /au/  on  the  basis  of 
knowledge  of  words  such  as  mouth  and  mountain.  When  the  child  parses  the  fi- 
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nal  segment  of  the  letter  string  -oung,  however,  a  different  pronunciation  for 
that  unit  is  activated  on  the  basis  of  the  occurrence  of  that  unit  in  the  word 
young.  As  it  happened,  the  pronunciation  of  the  ou  unit  in  the  pseudoword 
moung  was  frequently  /A/,  attesting  to  the  strong  effect  the  final  word  seg¬ 
ment  maintains  over  word  pronunciation. 

A  left-to-right  parser,  with  capacity  to  override  and  disambiguate 
pronunciations  activated  for  earlier  segments  of  a  word,  would  require  that 
the  reader  have  a  substantial  reading  vocabulary  and  awareness  of  the  phonemic 
segmentation  of  the  words  n  the  lexicon.  It  has  been  well  documented  not  only 
that  phonemic  awareness  is  a  predictor  of  reading  achievement  (Blachman,  1983; 
Bryant  &  Bradley,  1980;  Liberman,  1973;  Lundberg,  Olofsson,  &  Wall,  1980),  but 
also  that  this  awareness  is  enhanced  by  reading  experience  and  instruction 
(Liberman,  Liberman,  Mattingly,  &  Shankweiler,  1980;  Morals,  Cary,  Alegria,  & 
Bertelson,  1979).  We  may  speculate,  therefore,  that  the  limited  reading 
vocabularies  of  the  first  graders,  in  combination  with  underdeveloped  phoneme 
awareness  and  segmenting  skills,  effectively  limit  the  amount  of  information 
that  most  first  graders  are  able  to  utilize  in  reading  new  words.  As  a  re¬ 
sult,  they  were  more  likely  to  identify  high-frequency  words  correctly  than 
low-frequency  words,  regardless  of  the  number  of  alternate  pronunciations  for 
the  vowel  digraph.  Insensitive  to  orthographic  neighborhood  consistency  or 
inconsistency,  the  first-grade  readers  were  unable  to  use  the  larger  vowel  di¬ 
graph-final  consonant  context  to  disambiguate  vowel  assignment  to  a  vowel  di¬ 
graph. 

We  must  ask  whether  this  result  may  be  an  artifact  of  instruction.  All 
children  participating  in  this  study  have  received  what  is  best  identified  as 
an  eclectic  approach  to  reading  instruction.  As  reported,  the  third  graders, 
and,  even  more  so,  the  fifth  graders,  had  developed  a  sensitivity  to  the 
orthographic  neighborhood  consistency,  taking  account  of  the  wider  vowel  di¬ 
graph-final  consonant  context  to  disambiguate  vowel  assignment  to  vowel  di¬ 
graphs.  Apparently  by  the  third  grade,  children  who  are  progressing  normally 
in  reading  have  acquired  a  corpus  of  words  in  their  reading  vocabularies  ade¬ 
quate  to  meet  the  demands  of  an  operation  that  requires  phoneme  awareness, 
segmenting  skill,  and  prior  word  knowledge  to  determine  the  pronunciation  of 
an  unfamiliar  word.  In  contrast,  the  first  graders,  as  they  learn  new  words, 
are  just  beginning  to  identify  phoneme  correspondences  of  individual  graphemes 
and  may  depend  heavily  on  these  to  identify  vowel  digraphs.  Thus,  their  re¬ 
sponses,  though  incorrect,  include  some  substitutions  that  are  possible  in 
certain  other  contexts. 

This  difference  between  the  performances  of  the  first  and  third  graders 
raises  critical  questions  for  future  investigation.  We  are  interested  to  know 
if,  during  that  second  year  of  formal  reading  Instruction,  children  merely 
acquire  a  more  extensive  reading  vocabulary  in  a  rote  manner,  or  if  they  begin 
then  to  analyze  interword  relations  identifying  consistencies  between  ortho¬ 
graphic  structures  larger  than  the  individual  letters  and  their  pronunciation. 
Moreover,  we  should  like  to  know  whether  different  methods  of  instruction  will 
make  a  difference  in  the  development  of  these  skills,  and  even  whether  there 
may  be  lasting  effects  of  such  instructional  differences.  In  addition,  our 
attention  must  turn  to  those  older  children  who  fail  to  acquire  automatic  word 
reading  skills.  Are  these  older,  poorer  readers  functioning  like  the 
first-grade  readers,  or  are  they  utilizing  different  information  to  determine 
the  pronunciation  of  an  unfamiliar  word?  It  is  clear  that  the  answers  to 
these  questions  will  further  our  understanding  of  reading  and  how  it  develops. 
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